MEASUREMENT FOR GUIDANCE 





Exploration Series in Education 
Under the Adviiory Editorship of 
JOHN GUV FOWLKES 



Measurement 
for Guidance 


JOHN W. M. ROTHNEY 

University of XTisconsm 

PAUL J. DANIELSON 

University of Arizona 

ROBERT A. HEIMANN 

Arizona State CoUtie 



HARPER & BROTHERS 


PUBUSHERS. KEW VORK 





If I want to understand an individual 
human being, I must lay aside all scientihe 
knowledge of the average man and discard 
all theories in order to adopt a completely 
new and unprejudiced attitude. I can only 
approach the task of understanding with a 
free and open mind, whereas knowledge 
of man, or insight into human character, 
presupposes all sorts of knowledge about 
mankind in general. 


Carl Jung 



CONTENTS 


Editor’s Introduction ix 

Preface si 

I. Testing in General and in the High School Counseunc 
PROGRA il 1 

The Counseling Task — ^The Counseling Program — ^The Magni- 
tude and Scope of Psychological Testing — The Dcs’elopment of 
Tests as Aids to Counseling — Testing in Manpower Utilization 
and in Personal Development — The Place of Tests in Counsel- 
ing — -Summary 

11. Varieties and Sources op Tests 27 

Types of Tests — Sources of Test Materials — Checking on the 
Test Publisher-^Summary 

III. Criteria op Test Selection 43 

Validity— Validity for Counseling— ReJiability^Norms— Ad- 
ministration of 'Tests— Scoring of Tests— Mechanical Consid- 
erations— Summary 

TV. Test Scores: EnOLOcy and Interpretation 116 

Selection of a Test — Factors Influencing Test Scores— Summary 

V, The Use of Standards in Test Selection 151 

Tcchnial Recommendations for Psychologial Tests and Diag- 
nostic Techniques*— Application of Recommendations to the 
Cooperative School and College Abili^ Tests — Summary 

VI. Recording and Reporting Test Stores 204 

hfethods of Recording — Suounaiy 

VII. Combining Test Scores «tth Other Data 226 

Combining Test and Qinical Data fox Counseling— Informa- 
tion Provided by Personal DaU — Contradictory Evidence: 

vH 



viii Contents 

Cases — "That Depends" — lodividualizcd Test Interpretation 
— Summary 

VIII. Personauty Questionnaires and Interest Inventories 282 
Limitations of Short-Cut Methods — Attempts at Justification of 
Short-Cut Methods — Objections to Short-Cut Methods — ^Valid- 
ity of Interest Inventories — Reasons for Lack of Validity — 
Validity of Personality Appraisal Techniques — ^Validity of 
Projective Methods — Reliability of Interest and Personality 
Measurements — Norms of Personality and Interest Inventories 
— Suggestions for riie Counselor — Summary 


IX. The Future 

320 

Some Possible Des'dopments— Some Basic Problems 
Development — Curteol Signs of Progress — Summary 

in Test 

Apeenddc; AcTtvrriEs Reports 

353 

Index of Names 

371 

Index op Subjects 

375 



EDITOR'S INTRODUCTION 


Schools, students, and scholaishlp are in the limelight of the 
press, the pulpit, and home as never before in the United States of 
America. The importance of what happens in educational institu- 
tions from the first da^ of kindergarten through the last day of 
formal education is being emphashed in all quarters. Statesmen 
as Well as schoolteachers reflect an inspired determination to make 
sure that adequate funds ate provided to offer the quality as well 
as the kind and amount of educational opportunity needed for the 
optimum continued development of our country. 

Furthermore, it is recogni2ed that learning must continue 
throughout all adulthood a$ well as in childhood and early adult- 
hood, Increased offerings for "adult education’* ate being made by 
both public and private institutions. 

'The circumstances just dted are indeed grati^ing, comforting, 
and inspiring, particularly to all those engaged in educational 
activities. On the other band, these circumstances are also sobering 
and challenging on many counts, but especially in connection with 
the professional counseling of the individual boy or girl In the 
elementary or secondary school, the college and university student 
and, indeed, parents and grandparents. 

The need for and confidence in professional counseling is evi- 
denced in many ways. Specially trained workers in guidance and 
counseling are found in all types of educational institutions, ^fany 
commercial agencies also engage in advising men and women as to 
what they "should do” particularly with reference to the "job" 
they long to get. 

Intelligent and valid counseling is possible only if the counselor 
is as familiar as possible with the abilities, habits, strengths, and 
weaknesses of those who are counseled — in short, the counselor 
must know the counsclce. One of the bases most commonly used 

lx 




X Editor’s Introduction 

by counselors in learning about thelc counselees is the standardized 
test. 

It is therefore obligatory that professional counselors be wise 
in the selection of tests, skilled in their administration, and show 
deep and discriminating insight in the interpretation of test results. 
In short, a thorough and discriminating familiarity with measure- 
ment is essential for effective counseling. 

This refreshing and stimulating book deals with the functions, 
appropriateness, selection, use, recording, and interpretation of 
tests with respect to their value in the counseling of individuals. 
Weaknesses as well as strengths of tests are treated with the deli- 
cate disaimination and judgment that come from years of work 
with tests in relation to individual counseling. To the writer the 
highlight of this volume is the last chapter, entitled "The Future.” 

This concluding chapter presents a moving challenge and also 
points the way towards more valid tests and improved use of them. 

All those who work with tests, but especially counselors, will ex- 
perience pleasure as well as professional improvement from using 
and pondering over this wort 


John Guy Fowlkes 



PREFACE 


This volume is intended for use in the preservice and in service 
training of counseJors and others who attempt to use tests in the 
counseling of individuals. The emphasis is on the practical ap- 
plication of measurement in counseling for the large group of 
guidance workers who are familiar with elementary concepts of 
measurement but who are not yet ready for advanced technical 
work in theory. It is assumed that such persons will have had a 
first course in tests and measurements and will have developed 
some competence in the interpretation of elementary statistics. 

Most tests are designed to be used with groups but many authors 
of tests claim that their instruments may be used successfully in 
the educational and vocational guidance of individuals. Thus coun« 
selors, if they are to employ tests in their work, are retjuired to use 
instruments that may serve their stated primary purposes well but 
may fail completely to meet their second objective. The values and 
limitations of such instruments in counseling of individuals are 
considered fully in this volume. 

Discussion of several widely used tests of individuals has been 
purposely eliminated from this book. The Stanford-Binet and 
Wechsler-Bcllevue tests, for example, are not considered here be- 
cause they have been thoroughly discussed by their authors in vol- 
umes devoted entirely to their construction and use. In addition 
there seems to be sufficient evidence to indicate that they do not 
usually contribute enough per unit of cost in the counseling of 
most of the subjects a counselor meets in schools to justify their 
use, aithoiigh they nay do so in the clinical study of unusual indi- 
viduals, Similar reasons may be cited for the omission of discussion 
of those individual tests that purport to measure such factors as 
finger dexterity and mcclianical manipulation. Counselors who 
plan to use sucli instruments must take special training in Uicit 
il 




j\\ Preface 

use that goes beyond the scope of this volume, although some of 
the basic problems in the interpretation of scores obtained from 
them are similar to those that are discussed in the following pages. 

Samples of items extracted from a test and reproduced in a book 
cannot indicate its nature and value. Study of a whole test and the 
manual accompanying it is essential if the counselor or counselor* 
in-ttaining is to determine whether the test may be useful in his 
work. It is essential, therefore, that a file of tests and measure- 
ments be made available for use by students while they use this 
volume. For intensive study of some tests it seems desirable to have 
tests and manuals available for each member of a class. Samples 
are available without cost or at reduced rates for instructional pur- 
poses from most pubUshets of tests. 

In many places throughout this volume the authors have been 
extremely critical of some common practices in measurement while 
they have accepted many of its basic principles with enthusiasm. 
The types of tests and practices that are aiticired here are only 
samples of those commonly used. Many millions of tests are used 
annually by counselors despite the fact that their authors present 
no evidence or inadequate evidence that they can be used effec- 
tively in the counseling of individuals. The waste that may result 
and the wrongs that may be done to counselees are matters for 
serious consideration by those who work in the field of guidance. 
It is hoped that the criticism and suggestions presented in this vol- 
ume may result in a demand by counselors that better tests be pro- 
duced. It is believed that the use of such improved instruments 
will increase the effectiveness of guidance workers in their service 
to youth who meet real personal problems and who must make 
decisions that are crucial to themselves and to society. 

John W. M. Rothney 

Paul J. Danielson 

Robert A, Heimann 

August, 19^8 



MEASUREMENT FOR GUIDANCE 



CHAPTER I 


Tesfing in General and in the High 
School Counseling Program 


Throughout this volume emphasis wi]J be placed on the use of 
tests in the school counseling programs. The authors be]ie>'e that 
the employment of tests by counselors differs from the use made 
by others in terms of concepts, technical retju/remcnts, and appli* 
cations. Their use for the special purposes of counseling is the 
major concern of this volume. In the opening chapter a brief over- 
view of the general scope of testing will be presented and some 
of the cicaimstances that make testing in a counseling situation, 
as defined by the authors, different from those found in other 
applications will be discussed. 

THE COUNSELING TASK 

*T don't know just what f want to follow as an occupation; can you 
give me a 'vocational test' that will give me the answer?” 

"I Lliought I would like to take some tests to see if I can qualify 
for my current choice of college curriculum." 

”1 wish you would give my son a vocational aptitude test bcciusc 





2 Measurement for Guidance 

he can't make up his mind about ssliat be svanU to do after his 
graduation.” 

Every school counselor has heard these or similar questions 
many times. And many times, perhaps, he has resisted the tempta- 
tion to answer abruptly, 1 can’t ai’J Qi’ly ^ charlatan 

would. . . Yet, if he did so, he might be too harsh on the per- 
son making this common request, for the applicant usually believes 
that he is taking the right step in requesting some "scientific” 
measurement of his aptitudes and abilities. This procedure seems 
very much the thing to do in the middle of the twentieth century, 
when we seem to be living in the age of the specialist, when we 
have rationalized many of our procedures and behaviors, and when 
we seek "expert" advice about our everyday affairs. 

The counselor, a person with specialized training and a rich and 
varied background, Is concerned with all aspects of the perform- 
ance and behavior of his counselees. As he assists them to develop 
their attitudes and feelings in relation to vocational, educational, 
Of personal choices, he tries to keep before them the multiplicity 
of choices that are theirs to make and the probable resulting con- 
sequences. He is deeply concerned about them and he communi- 
cates this concern as he works on the development of a counselee’s 
understanding of himself and the making of decisions. 

And t hese d Kisions that a roid-twentieth-century youth must 
make have dimensions never before so complex or pressing in 
western civilization. WMe still in high school he must choose 
with discernment from among thousands of occupations, more 
than one hundred different college training curriculums in sonJe 
2,000 colleges, and from some three dozen ways of meeting his 
national service obligations. These in<^asingly complex choices 
are his to make. At one time these choices could bemade with the 
help of such highly integrative social institutions as the family, 
the stable small town unit with rather fixed, crystallized class and 
occupational structure, and the traditional academic secondary 



Testing in General and in Counseling Program 3 

school. These institutions arc rapidly taking new and strange 
shapes and forms. 

The youth may feel vague uacest and some anxiety over the 
spwtre of automation in industry with its increasing threat of tech- 
nological displacement and possible unemployment. He may see 
increasing concentration of industrial power by big industry with 
resultant and discomforting impersonalization. He may see society 
increase its demands for the highly trained specialist who must 
concentrate his technically acquired talents in narrower and nar- 
rower fields. 

The difficulty of making decisions under such circumstances may 
be compounded by an overspreading aura of anxiety, for this mid- 
twentieth-century youth finds his v ery society torn between con- 
fl ictin p jworid-wide- ideologies-and, hjs^orld._diyided_into_two 
armed camps. The impact of two world wars, a maj'of world-wide 
d epressi on. ,and the threat of the H bomb have left their stamp 
u pon his parents and his t eachers and they may communicate some 
of their uneasiness to him. 

It is in terms of this ^ad social base that the you^ approaches 
his co unselo r in a school setting for help in choosing among the 
best bets or the most likely avenues of decision in relation to his 
future. It is in this framework that the counselor tries to bring a 
little concern for individuality to this same bewildered youth. And 
although the educational system attended by the young mid-twen- 
tieth-century youth has given lip service to the idea of individual 
differences for several generations, he is likely to find it in reality 
only in the intimacy of a counscirng relationship. This deep per- 
sonal concern for each young person as an individual is the prime 
requisite of the recently developed scfioof counseling service and 
the major distinguishing characteristic of the counselor. It comes 
as a healthy antidote to the spreading dcpersonalhafion of Amer- 
ican secondary education and its bcll-curve-dominated, fraction- 
ized,_g)mpartmentalizcd scbools_wi^ their diverse curricula. The 
^unsclor stands as the one person— central and warm— to whom 



4 Measurement for- Guidance 

the youth can relate. The counselor may become a stable integra- 
tive force in the seas of uncorrelated, unrelated experiences that 
the youth often finds in the high school of today. But it is when 
he attempts to do something to show bis deep personal concern for 
each individual by helping him to help himself that the counselor 
finds his testing instruments particularly inadequate and ineffec- 
tive. 

THE COUNSaiNG PROGRAM 


It is not the purpose of this volume to describe counselmg pro 
grams in general but some elaboration of common underlying con- 
cepts and frequently stated assumptions of such programs are 
presented below to indicate the settings in which tests arc used by 
counselots. 

The major objectives of guidance programs at the secondary 
level are usually stated in terms similar to these: 

1. “The provision of assistance to all students in order that they 
may reco gnize their limitations and potenti al ities, to the fulles t 
posslBIe^^t, and to^uUIize this knowledge and develogment^irL 
p lanmn g their sch ool.and post-school careers. 

2. “ To coordinate the efforts of home, school, and.cocnmuxuty 
to assist jtudents toward the goals named above.” ^ - 

The nature of the individual student is such that the successful 
attainment of these objectives can be realized only to the extent 
that counseling becomes an individualized affair. This point is 
made clear by Rothney and Roens in their statement that; 


Counseling must always be an individualized affair. . . . The word 
"always” is used advisedly for ttic foundation of rn.mtpUng is found 
in the fact that there a re person^ ch oices to be made . In many cases 
there are similar situations and patterns of development which re- 
quite simUar choices, but, in the last analysis, there must be some one 
person who accepts the tesponsibUity of helping this particular indi- 


Ite-JSih-Bert-Ar-Roeai. Guidance of American Yculi. 
Cambridge: Harvard University Ptes^ 1950, p, ipy ' 



Testing In Genera] end in Counseling Program 5 

Wdual to analyze his unique personal problems. To such situations 
someone must brin g particulariz ed knowledge obtained from records, 
ob.sgrvatior^s , and tests.^nd .someone mustinterpret it. Someone must 
answer the student's specific questions, and someone must raise 
particular questions that he may not have raised about himself. Some- 
one must interpret m each student separately spedaliae d edua- 
tional and vocational implications which h^ because of his lack of 
experience and knowledge, is unable to recognize, and,someone must 
he/p each student to appreciate the social and domestic arcumstances 
of his particular characteristics and situation. Someone with quick 
personal perceptions and a sympathetic interest in human difficulties 
must help a student to help himself when he finds that he is con- 
fronted with problems beyond his power to solve. ... It is these 
personalized tasks, then, that the counselor, who has only a token 
teaching assignment and who has had specific training, will under- 
take.* 


In carrying out this assignment it is assumed that administrative 
arrangements will provide the time and facilities to enable the 
counselor to confer with his subjects frequently over the years that 
they are in school. It is also assumed that enough r apport ha s been 
established so that the students feel that assistance will be pro- 
vided if it is sought. 


THE KOLE OP THE COUNSELOR * 


The central figure, the person, the "someone" referred to in the 
section immediately above is the counselor who, in assuming his 

» Joho W. M. Rolhnfy ind Bert A. Rociu. CounttU^i lb* ImimJitJ SiuJ*hi. Nrw 
York: Dr>'d^ Press, 19-19, p. <• 

*!a ikr iteyriofmaa tS tnutueluvS- auetr^pf to the 

task of the counselor will ha'e to be quite aibiUarr. The questions of »bo is the 
counselor and what he docs base been disturbing for nuny jears, aoJ most attempts 
to spell out the task of the counselor base resulted either ta something quite rapue 
or ia a synthesis so geoe/al as not to be espcoallr helpfU, Those who hase beta 
engaged in something called '•counseling" will rccognue the problem; those in 
training will probably experience it in future rnotrents of inuospettioo if. indeed, 
they base not already done so in their search of the Jiteraturc. 

At best, it would arF«« answer to the quesUon • VThat is die task of the 

counselotr' must be prefaced with ~. . . that depcois." It will depend on the ex- 



4 Measurement for Guidance 

role will work in a face-to-face and one-toonc lelaUomhip with 
thc’coumelee in which he will "consciously attempt by verbal 
means to assist [Ihe student] in modifying attitudes and other be- 
havior with respect to educational, emotional, and vocational is- 
sues." * 

QUESTIONS TO BE CONSIDERED 

During interviews the counselor wiJi be re<]uired to perform 
many different functions. Some of the vetfe ths-ijn^y be used_tP' 
describe his activities u^inUr^ret, inform, listen, dejcube, eom- 
'pliment, encourage, refer, demonstrate, provide, assist, and confer. 
/Any iist of this kind cannot be a>rDplete because special circum* 
stances will rerjuire special action. He wUl do these things as he 
seeks to fulfill the objectives of his program. As the counselor goes 
about his duties he will find that the problems raised and the issues 
discussed will tend to r un th e-gamuLof human difficulties as his 
subjects try to Jad.tbeir_waY among the forests of their own desires 
and the toadbloc^ that sodety puts up against them. He will find 
that he must answer the questions of students, parents, school per- 
^nn el, potential employers, and personnel of institutions fqr ad- 
v^ced^tr^ing. He will find that he roust help these individuals 
to find answers to their own questions and he will realize that he 
must raise questions that these persons have not known enough to 
ask. He will discover, too, that he must seek answers to questions 
that arise to him as be works with these persons. 


tent of tijuufle^ q£ ^ ^ muIw uWcii ina ttiiaias 

w« obtained, the "•philosapbjf" ^ the ukuoUuetioQ under which the counselor 
IS working, the fecjJitier at bis disposal, the refwal and allied agencies »ith which 
he can work, whether the counselor is THE gmaaace~prograni or a “service plus" 
among manr other sen ices that may be provided. Beoiuie of the manrvarUbtes in- 
volved it IS iwt difficult for the reader to see why no wholly inclusive and lOutnaJlr 
acceptable defcaitjon of duties has been evolved. 


* John W. h{, Rolhaey and Paul J. Dasiebon. ■'Counselioe.’' 
ihnal Rriearfb, April, I95l,21-Ai2~li9. 


Refiew of EJaea' 



Testing in Generol and in Counseling Program 7 

The questions that are asked, or sometimes just implied, are so 

numerous, unique, and varied that th^ defy listing or classifica- 
tion. The questions given below are some that counselors often 

meet, but they must be considered only as samples, 

QUESTIONS BV STUDENTS 

1. Am I capable of undertaking certain training successfully.^ 

2. Why can't I do this work well.^ Why am I having trouble with 
this course? 

3. Do I have any particular strengths or weaknesses? 

4. Could I get into officer training when I get in the armed services? 

5. What are my chances of wirmtng a scholarship? 

6. Do you think I could pass the draft deferment test if I go to 
college? 

7. My teachers say I am not working up to my ability. What do they 
mean? 

8. Why can't I learn things in some courses when I can in others? 

QUESTIONS BY PARENTS 

1. Is my child capable of undertaking a certain kind of training 
successfully? 

2. Why is my child having difficulty in his training? 

3. Could my son be successful as a mechanic (or any other specific 
occupation)? 

4. In view of my child’s health would it be wise to plan for training 
beyond high school? 

5. Should my child be accelerated? Should he repeat some of his 
work? 

6. Is my child better fitted for rwic kind of work rather than another? 

QUESTIONS BY SOtOOL PERSONNEL 

1 . Why is this student not w orking up to capaaty? 

2. Why is this pupil having difficulty in my course? 

3. Wouldn’t it be better to guide this student out of a prticulir 
program of study? 



8 Measurement for Guidance 

4. Is this pupil actually unable to do this work or is it just a case of 
not trying hard enough? 

5. Shouldn’t this student be in a fast section? a slow section? 

6. Is this pupil really as dull as he seen^? 

QUESTIONS BV PERSONNEL OF 
INSTITUTIONS FOR ADVANCED TRAINING 

1. Does this applicant show promise of success in out institution? 

2. Has this student shown any particular strengths or weaknesses? 

3. Is this pupil well enough prepared to undertake a particular course 
of study? 

4. Has this student developed good habits of study? 

5. Is &is student worthy of sdiolarship aid? 

6. Is the applicant sufficiently mature to undertake this training? 

QUESTIONS BY EMPLOYEES 

1. Would this applicant do better in one phase of our work (me* 
chanical) than in another (clerical)? 

2. Gui you provide us with information that will assist us in the 
placement of this individual within our organization? 

3. Our training program within our company provides the following 
opportunities. (They are desadacd.) In which of these do you 
think we ought to place hint? 

4. How docs he get along with others? 

5. We are interested in potential leadership — How will he respond 
to trying? 


QUESTIONS BY THE COUNSELOR 

1. In view of this student s stated choices, what educational program 
■would seem best for him? 

2. During the time left in school what ate the probabilities that his 
performances may change? 

3. What explanation can be found for the apparent inconsistencies in 
this student's record? 

4. Is this student’s visual condition such that he should not under- 
take a lengthy program of training that tequites much reading? 


Testing in General ond In Counseling Program 9 

SOURCES OF ANSWERS TO THE QUESTIONS 

As the counselor sets out to find the answers to such questions, 
and to help others to obtain them, their variety and complexity 
demand that he turn to many sources, procedures, and techniques. 
He will begin his search with acceptance of the fact that there is 
no single procedure that will guarantee success, no one source that 
will be infallible, and no particular technique that can always be 
applied. Except in the simplest situations the counselor will seldom 
find that there is a dear-cut course of action. He will usually find 
that he and his questioner must embark jointly on an enterprise 
that will be complex and time-consuming over a considerable 
period. 

It is impossible in the current status of our knowledge about 
human beings to decide, before conferring with a student, what 
information about him needs to be obtained so that his questions 
can be answered, Rothney and Roens * have set up some general 
guides for the collection of information about students. They in- 
clude emphasis on the need for finding out what is pa rd^lar ly_^ 
i mportant to each counselee, the need for longitudinal data about 
him, the neccwity’of having information about his cultural milieu, 
the attitude" of holding filial conceptualization of the person in 
abeyance till all possible sources oHnformation have been ex- 
amined, and the willingness to appraise all sources of information 
thoroughly before they are used. These, howe%-er, are general 
g^es and the specifics must depend on circumstances and condi- 
tions. To answer the questions of a counselee who comes initially 
beause of vocational indecision tlic counselor may have to con- 
sider, among other things, his subjec t's health , previous perform- 
ances and experi ences in related ar gs^^ thc^financiaT^nd social 
circumstan^ of his home, the oppprmnitics-fot.tfaining and em- 
ployment in~^tlie areas considered, his usual behavior and signifi- 
» Rotljncr »n<i Rociw, op. eit., pp. 4B-64. 



Measurement for Guidance 

caat variations from it, his enthusiasms, his social adjustability, 
and anything about him as a particular person that may assist in 
resolving the indecision. 

The sources to which the counseloc may turn will then be many 
and varied. No classification o£ such sources can ever be adequate 
but the general listing of them below may serve to point out their 
scope. In general the following may be used. 

1. Interviews with: 

The counselee 
Parents 

Peers 

Employers 

2. Study of the eounselee's records: 

School 

Employment 

Health 

Activities in the community 

3. Examination of past performance of the counselee: 

Personal documents 

Productions as in art, music, shop 

4. Files of occupational materials: 

General information 

Local data 

5. Source books: 

College catalogues 
Handbooks 

6. Test performances 

THE MASNITUDE AND SCOPE OF PSTCHOLOSICAl. TESTING 

There has probably been no accurate count of the number of 
psychological tests administered annually, but it is evident from 
some of the figures reported that we ate rapidly reaching the point 
at which few persons will have escaped their influence during their 



Testing in General and in Counseling Program 11 

lifetime. The following estimates will give the reader some idea 
of the magnitude of "operation testing" as it has developed in 
recent years. 

In a recent brochure prepared in the observance of the fiftieth 
anniversary of a major test publisher it was estimated that "up- 
wards of 75,000,000 standardized tests are given annually in the 
schools." * Estimates of another test publisher ^ place the figure at 
more than 100 million, or an average of three tests per pupil at 
the elementary and secondary school levels. Recently a representa- 
tive of a third major test publisher displayed with some pride, 
while visiting one of the authors, a facsimile check in the amount 
of some $56,^00 which represented the cost of a test order from 
a single large city school system. 

The magnitude of educational testing can be noted further in 
excerpts from the annual report of the Education Testing Serv- 
ice.* In its report for the year 1954-55, it was indicated that some 
171, d44 individuals were tested by the College Entrance Examina- 
tion Board between August 11, 1954, and May 21, 1955; that some 
4,000 were tested for admission to the United States Air Force 
Academy in March, 1955, 14,000 were tested for the General 
Motors National Scholarship plan, and 22,300 civilian candidates 
were examined for the Naval Reserve Officer Training program. 
Examinations were prepared and administered to applicants to the 
United States Military Academy, the Coast Guard Academy, and 
the United States Merchant Marine Academy. The Cooperative 
Test Division of the Educational Testing Service, through its 
Freshman and Sophomore programs, planned, scored, and inter- 
preted 532,426 college-level tests between the fall of 1953 and 
spring of 1955. In the year ending June 30, 1955, 51,789 tests 
were administered as a part of the Graduate Record Institutional 

* Sunitrihti r tiling— an tUitnint* /• EJntnJioMn] Phbliitint- Yoolteo, N Y.: 
World Boot Co. No date. 

» Lyle Spencer. GuiJjnct Srwfltntr. Oucaso: S<»encc Research Aiiociaie*, 
January, 1955. 

* Educational Testing Sersice. AnnnaJ Rtfoit. Princeton. N.J., 1954-55. 



12 Measurement tor Guidance 

Testing Program. National Teacher Examinations were adminis- 
tered in 1954-55 to 9,165 candidates, and the Medical College 
admission test was given to 12,646 candidates. Between July 1, 
1951, and June 30, 1955, 1,144,777 tests were given to 354,818 
candidates in the total ETS Supervised Testing Progiam, Special 
Testing Programs, and Institutional Testing Programs. These fig- 
ures do not represent the tests sold outright for use by agencies 
that did their own testing. The figures suggest something of the 
number of persons involved in testing but, perhaps of greater sig- 
nificance ate the implications of the purposes for which they were 
given. 

Business and induskrlal concerns have become major consumers 
of tests. Because many industrial tests are custom-designed and not 
distributed through usual ebanneU, an estimate of the total indus- 
trial testing operation is difficult to make. Some of the numbers 
used may be obtained, however, from such sources as the fol- 
lowing. In an article dealing with the increasing use of tests in 
industry, Whyte* reported that 1,000,000 copies of a personality 
questionnaire were sold by a single distributor in one recent year. 

in a review of testing done by management, reported 
that one “Aptitude Index" devised by the Life Insurance Agency 
Management Association had been administered ttVnalf a imlUon 
prospective salesmen by 174 life insurance companies since 1938. 
The survey further indicated that some 560 industrial tests were 
available. In 1950 the Psychological Corporation sold mote than 
$600,000 Worth of services to industry. In ten years of operation 
(up to 1950), the Klein Lislitute for Aptitude Testing sold 100,- 
000 test batteries to more than 800 companies. The importance of 
itsi lesnVts \o 'dne indiviauals involved is implied in one example, 
where the survey reports that "Social Research, Inc. has sold the 
TAT (Tbemalic Apperception Test) to about 75 customers, some 

bJ, ’’"”“‘“7 Testlas.- Fm.„, Septen. 

The Tests of Maaigement" Portuae, July, 1950, 42:92-9^. 



Testing in Generol Qnd in Counseling Program 13 

of whom will not make an executive change without administering 
test.” “ 

Afany "consulti ng firms” have found a xeady market for psy- 
chological testing in industry. Many industries have employed their 
own psychologists to aid, through testing, in the selection of em- 
ployees. They use some of the hundreds of tests available commer- 
cially or devise tests to meet their own needs. 

The figures given above could be supplemented by those from 
other major users of psychological tests. The reader may be one 
of over 18 million men who, during and since World War II, took 
two or three tests, such as the Army General Classification Test, 
the Radio Operator's Aptitude Test, the Mechanical Aptitude Test, 
or the battery administered to enlistees in the navy, in the air corps 
classification program, and those that may have accompanied ap- 
plications for Officer Candidate Schools, or other special assign- 
ments. Other readers may be among those presently in college 
rather than in the armed forces because of their performance on 
the Selective Service Exam. Some readers, while they were seniors 
in high school, or as registrants for work, may have participated 
in the testing program sponsored by the United States Employment 
Service,” in which some half a million youth arc tested each year. 
And there are certainly among the readers of this volume persons 
who were administered a battery of tests when they made appli- 
cation for training under the GI Dill. 

Many more samples of the use of tests could be presented here, 
but enough have been given to indicate the extent to which psycho- 
logical testing is touching the lives of great numbers of individ- 
uals. While the phenomenal growth in the testing movement bus 
not been a recent development, the greater part of the movement 
has taken place since World Wat I and particularly since the be- 
ginning of World War II. With titis overview of (he general 

p. 104. „ , 

w Dealfice }. Dvofit "Tfie Gencrit ApUtuJir T«t Bittcrr:’ C^iJ- 

anct JourmJ, November, 1956, 3>;14>-156. 



Meosurcmcnf for Guidance 

growth o£ the testing movement in mind wc may now turn to tlic 
more specific study of Uic growth of the use of tests in counseling. 

THE DEVELOPMENT OF TESTS AS AIDS TO COUN5EUNG 

Originally, testing, and more specifically mental testing, were 
used for classifying individual duldrcn as feeble-minded oc nor- 
mal. Within the first 25 years of die twentieth century, the mental 
testing movement adopted the same mass procedures and large- 
scale standardization that became common to many other processes 
on the American scene. Faced w-itli the need to classify 2 million 
soldiers under the emergency conditions of World War I, group 
mental tests and group adjustment inventories were des’clopcd. 
The scores derived from the group tests approximated the scores 
from the individual mental tests but they also lost the sensitivity 
of the slower, mote time<otisummg methods. 

As an iCtetmath of the war, gtoup tests of all sotts blossomed 
foith in almost cs'cry conceivable area and the mass testing move- 
ment of the 1920’s wa s botm^Gioup tests of mental level and 
achievement were widely used in education. Industry that had long 
undergone a specialization and standardization process attempted 
to find in the mental testing movement a ready answer to its needs 
for easy and rapid employee selection. The concept of "finding the 
square peg for the square hole" became popular, and the psycho- 
logical aptitude test was proclaimed as the best, and certainly the 
speediest, instniment for appraising the peg. Tests were developed 
for such occupations, among others, as streetcar conductors, gum- 
wrapping machine operators, and drill press operators. A job ap- 
plicant in 'industry was just as likely to face an Otis Intelligence 
Test as a high school youth who was to be assigned to a fast or 
slow section of American history. 

Basic to this new concept of management in industry and the 
measurement movement in education was the actuarial^concept 
that through thMesting of large numbers of people and determi- 



Testing in General and in Counseling Program 15 

nation of their average performju ices>.standards could be set up. 
From these performances cutting scores were.derlved'andVcandi- 
date who scored below them was regarded as a poor risk for job 
success or for training. Literally thousands of correlation coeffi- 
cients indicating low relationships between occupational or aca- 
demic performances of subjects and their scores on tests appeared 
in professional journals in the 1920’s to the 1940‘s. Behind each 
ran the philosophical thready . . everything which-cxists can.be 
measured . . . and everything that can be measured exists.” This_ 
was indee d- empiricism run rampant! 

“ ^John Dewey .” writing in 1928, cautioned that the primary re- 
sponsibility of school was the encouragement of diversity of per- 
formances rather than the iiniformity that averaging of scores 
encourages.^ Aad_he viewed with alarm the increasing efforts to- 
ward "scientific” measurement in schools. As statistical sophisti- 
cation became more widespread other writers saw the dangers of 
the nomothetic or jwtuarial approach to the solution of problems 
of prediction. Recently Rothney and Roens pointed out that 
. . regardless of the general relationship between two variables 
expressed by a correlation coefficient, it is possible to find relation- 
ships within a particular counselec that run counter to the pattern 
indicated by the coefficient even to the degree of complete reversal 
of it; to find, within one person, the am<xint of correlation between 
characteristics which is common to the whole group; and even to 
find within one person, closer relationships than would be expected 
in view of the size and direction of the coefficients obtained from 
mass data.” ** 

As tke cocr.we.'r.vg rrnovewMwt and rounseJors clarified in- 
tents and purposes, it became clear tiiat while the actuarial 
approach permitted faicly accurate predictions of group perform- 
ances, the counselor was retjuired to consider that fraction of the 

« John Dewey. "Proijrwj^c EJocaiion ani the Science of EJucitioo." 

EjMt4iion, AuguJl, 1928, 5:197-204. 

»• Roil>ncy anJ Roenj, op. tit., p. 19. 



14 Measurement for Guidance 

population for whom the prediction was not accurate just as much 
as he was concerned with the performances of the group as a 
whole. And with this recognition of his task, he became as much 
impressed with the optimum human developm ent of^ ch_of Jjis 
counselees as with tHT^jumuimjSiliaatioft of manpower of 
particular class of individuals.. It became difficult to keep the 
proper perspective of dedicated interest in the individual counselee 
and the validation of clinical “hunches” about his future possibili- 
ties, when the generally accepted practice was one of assigning an 
index number to an individual and reading his predicted outcome 
from an actuarial table. While these practices seemed to work 
with a fait degree of success for such groups as the entering class 
of freshmen at a large university or training groups in the armed 
services, these procedures failed to predict the subsequerrt perform- 
ance of many subjects on the job or in training. 

Th ^ s ome Jow-scortag testees did fax. better on-the-job or^ia 
training than had been predicted from their scores, or that the 
ultknate performance of others who scored high on tests tuinedT 
out to be disappointingly low, tended to make the sophisticate 
counselor wary of makbg performance estimates /or the individual 
based on estimates of performance of groups. Careful search of 
many test manuals failed (and still fails) to disclose hints or sug- 
gestions as to actions of a counselor when confronted with this 
dilemma. Not completely satisfied with the dismissal by statisti- 
cians of this contradiction with statements about sampling errors 
ojL^^cej^^ots.oc. probability estimates or maximum likelihood 
estimates, the concerned coimselor began to challenge some com- 
monly held psychometric concq>ts and to examine them with more 
.^r^nd_detail. Detailed discussion of such matters will be found 
in_later chapters. 

TESTING IN MANPOWER UTILIZATION AND IN 
PERSONAL DEVELOPMENT 

The increasing sophisUcatioa o£ some counselors in the use o£ 



Testing In General and in Counseling Program 17 

tests has by no means become universal. As suggested in the open- 
ing paragraph of this chapter, varying concepts of the role of test- 
ing and the use of test results still exist. This appears to be true 
not only in broad areas of application but also, to some extent, 
within the single area of counseling. Super, in a discussion of some 
of the differences in concepts underlying European guidance pro- 
grams and those found in the United States, h as shown the impli- 
cations involved in the use of tests as tools for h um-^n develf^pment 
or as manpower utilization. H is discussion of th e_Ffench programs^ 
provides a good example of the "manpower utilization” point of 
view and the use and efficacy of tests in its implSuOTtation. 

During part of our formal discussion the focus of the Committee’s 
{French] interest was on the question of how we know what num- 
bers of men and women will be needed each year in each occupation, 
how this information is applied at national, state, and local levels in 
the- plann ing of educational facilities and programs, and, to a muc h 
l esser decree, how H is applied to individuals in planning an educa* 
ti on and iiLchoosing a deld of work. The assumption was made that 
the information could be applied to the individual by testing his 
capacities and interests and finding out whether or not he qualified to 
enter the desired type of educational program. Obviously, if. 100,000- 
secondary school teachers ate needed, only the 100,000 best qualified i 
can.didatw should be admitted to tcadicrs colleges, and those falling . 
belosv the cutting point should be diverted to otlicr types of profes- 
sional or occupational education. This type of application to tiie 
individual was more or less t^en for granted. . . 

This on the surface might appear to be a highly commendable 
and objective approach, e xcept that it ignores the great lim itations 
of tests as predictive devic es, and worse, even if it recognizes the 
fallibility of _t^Sj_it relegates individual worth and integrity to a 
position of secondary consideration. 

»» Dooald E. Super. '•Cuidince: Stanpower UuJoatloa or Jiuman D<:»eJopnseflt“ 
PcTionml unJ GuiJjnn Jontnj], September, 19>4, 33:^1^.^ 



]8 Measurement lor Guidance 

Supers description of the French approach has been confirmed 
by Coiunt. 

How sharp the difference . . . between our schooU and those of 
[free] Fiimpe'.Kve use o nr schoob to adtivate_egualilir_oLeEPgri^ 
:;^tunity. Europe us^ its schools to keep wide the gap between tho5c___ 
^HTwill nuke a living by their bands and tliose who wilkmake_a — ^ 
Uving using their btaio. 

For chUdren up to the age of 11 (»n some cases 12 orl3), Euro^ 
provides good common schools. Then comes the tragic break. ^/»/: 
dres are sifledby ttiU and examinationt to decide who^wUl go on to ^ 
secondary school and university and who'will prepafe'fdr-work'. By 
this method, nin^^ildren out of ten arc declared ineligible for ^ 
further education and are shunted Into training.’* 

The "manpower utilization” concept and its consec|ucnt impli- 
cation about the use and hi^ eflficacy of tests is not, of course, 
peculiar to European guidance and educational programs. To a 
considerable extent, this vies^point was basic in the use of tests by 
the military in this country during World War II. The use of tests 
for such purposes in the war under an emergency is understandable 
Iwhere screening of large numbers of men must be accomplished 
1 rapidly. The lack of time, the urgency of the situation, and the 
’^need for rapid and effective training demanded methods that could 
be applied to masses of men in the hope that the number of "suc- 
cesses” would exceed the results of assignment by lot. Here, of 
^course, in the total job to be done — winning a war— persorul 
wishes and aspirations were of secondary consideration. 

To a considerable extent, a parallel to the military use of psy- 
chological tests is found in current industrial testing. Here again 
expediency dictates practice. Where the training of employees is 
Mpensiver~l LQW to $ 10.00 (1 per man i n some c«es— the desire 
of management and stockholders to reduce failure of employees 
on Ac job is understandable. Inaeasing Ae number of successes 

D.C: Arthur C Ooft PiJ>ticaio(a. 



Testing in General and in Counseling Program 19 

owr the number of failures shows up on the profit and loss state- 
ment, and even a slight improrement over chance selection in In- 
dus^ may justify the use of tests. 

It has not been the intent of the authors to question the value 
of tests in the situations described, or in other applications outside 
the counseling in schools. The intent here has been to suggest that 
situations dictate the flexibility or rigidity of technical demands 
on the instruments used. As suggested earlier, the counseling situa- 


Table 1 . Factors Coniributiag to Counseling 


Testing for Counseling Testing for Selection 

(Secondary School) (Industry, bfilitary, etc.) 


Concern with all members of » par- 
ticular situation, i.e., all students in a 
given high school regardless of range 
of perfonsances and characteristics. 


Unique concern with one individual at 
a time. Couaseiing is an individual 
affair. Averages or percentages or 
success is of little comfort to those 
who are not successful. 


Same obligation to all students. Individ- 
ually, they are very much present and 
a working part of Khool organirations. 
Counselors cannot turn them away. 


Students arc going into a future the 
dimensions of which are not known- 
Counselor working with many vari- 
ables and with many unpredictables. 

Concern with the individual for hU 

"Vown sake, his worth as an mdividuab 
its successes. 

Many variables in persons and situations 
appear over a long period of lime. 
Demands of society and differences ia 
ibe definition of success by society do 
not permit many generally accepted 
dcfimtioiu of success. 


Concern with Imutd numbers of ap- 
plicants for work within a specific 
organiaatioo, with some Kreening. 
perhaps, involved in the nature of job 
announcements and specifications. 

Testing that improves over chance selec- 
tion pays off in terms of production, 
the company’s primary concern. Where 
there is a gain of esen one successful 
employee over failures, testing may 
more than pay for itself. 

In a sense no obligauon to any applicant 
and especially none to those not 
selected. Future or next steps of those 
rejected in a hiring situation require 
no further contact, or the formula- 
tioa of alternate plans. 

Selection made into a defined situation 
— usually a specific job with dimen- 
sions esubjished. 


Not concerned with individual as such 
—nio does the job not as important 
as getting the job done by anyone. 

Selection testing can prose itself over 
the long range. It is effectise if, oser 
the long run, more successes than 
failures are picked. 



2Q Mepsurement for Guidance 

tion has its unique demands, differing at times from, those of in- 
dustry. Its objectives are dictated, at least in part, by the philosophy 
of education that emphasizes human development. Some of the 
fartors that contribute to making the school counseling setting 
different from that, for instance, of the military or industrial, are 
summarized in Table 1. 

the place of tests in COUNSEUN© 

In this volume the reader will be constantly reminded that his 
function is to serve individuals as individuals and that he is not 
employed primarily as a selector for industry or advanced educa- 
tional institutions. It is implied that if he does the first task well 
the selective processes may be improved. It will also be noted 
throughout that the counselor will be working primarily with so- 
called normal cases l^the of the" AmTticaiT scHbo ITatH^ 

than with patholpgical, "clinical," or "disturbed" cases in hospitals 
oc^with the applicant for employment in an industrial setting. It is 
implied that his effectiveness with normals and in the school situa- 
tion may be improved by use of tests. 

It is conceivable that much good counseling can be done without 
use of tests and, indeed, there is evidence that it was done many 
centuries before standac<hzed tests were available. If all tests were 
currently eliminated from counseling it is unlikely that society 
would recognize the change for many years. Business would go on, 
schools would continue, and millions of young persons would be 
satisfied with their choices of training, occupation, marriage part- 
ners, and leisure-time activities. Research in education, guidance, 
and psychology has not clearly and conclusively demonstrated that- 
the use of tests has increased the welfare or productivity of any 
si^cant numbers of persons despite the fact that mUlions of 
them ate used aonually. There is some evidence that they may be 
of assistance occasionally in answering the questions of persons 
who score at the extremes of lest distributions but they do not 



Testing in General and tn Counseiing Program 21 

provide final answers. They may, for such persons, help in the 
working out of probabilities, odds, or best bets in general on the 
average, on the whole, and other things being equal (which they 
seldom are), and in doing so they may become vaJuabie sources of 
supplementary information about students. 

It seems likely that if tests are to be useful in counseling they 
will be so only insofar as they have been selected for use in answer- 
ing s/fecifc questions of particular counselees. Since certain kinds 
of questions, such as those that refer to reading performances, are 
likely to be asked by many subjects, it may be desirable to adminis- 
ter a reading test to large groups of subjects at one time. Except 
in seeking answers to very commonly asked questions, however, it 
seems that testing is likely to be most useful when a plan of 
tailoring the testing program to individual cases is employed. It 
would, for example, be a considerable waste of time and money 
to give a so-called mechanical aptitude test to all the students in 
a general public high school because only a small sample of the 
counselees, their parents, their potential employers, or school per- 
sonnel will or should raise questions about their mechanical per- 
formances. Those about whom such questions arc raised may be 
tested individually or in small groups with, perhaps, some benefit. 

The statements that appear above were designed to suggest to 
the reader that, with a few exceptions, testing for counseling is a 
differential, not a mass procedure. It is indicated that testing will 
be focused on particular individuals, that tests will be selected 
when there is a specific question to be answered. Tests will be used 
when other sources do not provide answers. "When in doubt, 
punt,” says the football coach; and "When in doubt, lead trump, 
says the bridge player. Though neither is an infallible rule, the 
counselor may take a cue from them and decide that "When^in 
doubt, test.” When he docs so he may sometimes score well and 
he may be able to assist counselees and others who are concerned 
with their welfare to answer questions (in terms of probabilities) 
that will help them to make wise choices. 



22 Measurement fpr Guidance 

la the following chapter wc shall be concerned with the num- 
bers and kinds of measuring instruments available and some of 
the problems involved in the selection of the most serviceable 
among them. 


SUMMARY 

In this chapter it has been suggested that psychological tests are 
used in great numbers each year in a variety of situations, each 
with its own rationale, and with varying implications for the lives 
of the persons involved. The nature of the individual places unique 
demands upon the application of pq'cbological tests in counseling. 
The counselor must be prepared to help one student at a time with 
a wide range of problems and in the making of many decisions. 
In fulhlilng this role he must turn to many sources of data and 
to a vati^ of techniques, no one of which is infallible. Psycholog- 
ical tests, as one of the sources of information about individuals, 
may assist the counselor in atrswertng some of the questions with 
which he is faced. They are likely to be most useful where they ate 
selected for use in answering specific questions about particular 
counselecs and to the degree that it is recognized that testing in 
counseling is a dijjeretilial and not a mass procedure. 

Discussion Questions and Exerqses 
1. There is e\'idcnce that an increasing number of institutions of 
higher learning are turning to the use of psychological tests in 
admission procedures with a view to holding down enrollments. 
Such selection devices may result in some being accepted and 
failing and oUiets being rejected who might have succeeded. What 
arc the social implications of such a policy? How does such a 
.policy relate to tlic concepts of human development? Ivfanpower 
utilization? Who, if anyone, assumes the responsibility for each of 
the types of persons above and what could be done about the 
sitiution as described? 

2. Aisummg foil, IS 1 counselor, are asked one of the r^ucstions listed 



Testing in General ond in Counseling Program 23 

on pages 7—8 (select one). What would you want to know 
about the individual, and what questions would you ask yourself 
about the situation before attempting to help the person to reach 
an answer or make the decision implied by the question? 

3. The following request is the sort that is frequently received by 
guidance departments: 

Dear Mr. Jones: 

My son, age 16 and a Junior in high school, does not seem to 
know what he wants to do when he graduates from high 
school and I am concerned about it I understand that tlicrc 
are tests to tell a youngster what he should do and I would 
appreciate it if you could arrange to have him take them. 

Sincerely, 

John Q. Smith 

Prepare a reply to the letter. Keep in mind that the person making 
the request is not likely to understand professional jargon. 

4. The following tables were recently draftm up from results of a 
questionnaire survey by a state department of education of testing 
practices in Grades 1 to 8 of the schools in sixty counties of one 
state. In order to get some uniformity in the interpretation of the 
items on the questionnaire, at least one conference was held witli 
the person who w'as to answer it by a member of the staff of the 
state department of education. Comment on the purposes listed in 
Table 2 with respect to the relative rank of importance as indiated 
by tlie frequencies for each of the categories. If you were to re- 
arrange them in the order in which you thought tests would be 
most useful «’haf rank would you gho each of than? \i^ou)d you 
eliminate any? If so, whiclj and why? Would you recommend any 
clianges in the kinds of resource persons (see Table 5) ihq* 
would use? Why? What factors seem to have influenced the choice 
of tests listed in Tabic 4? Using the general concepts suggested 
under "Testing in Manpowxr Utilirarion and in Personal Dc\clop- 
ment,” how would you classify cadi of tlic stated purposes? 



24 


Measurement for Guidance 

Tasix 2. Purposes of the County Testing Program 


Purposes 


Number of 
Couaties 


Measuring general achiesetnent of iodiTiduai pup^ 

Diagnosing pupils’ needs and abilities 

Grouping pupils for insUuctioo 

Determining mental ability 

Determining basis for remedial insUuctloa 

Securing daU for guidance purposes 

Aiding teachers in self-eTiluatioa 

Deciding whether curriculum objectives were met 

Deciding on promotion of pupils 

Evaluating achievement on a couatf*wide basis 

Determining readiness 

Helping supervisors in work with teachers and/or pupils 

Coiifirming teachers judgments of pupil progress 

Interpreting pupil progress to parents 

Providing a basis for grading 

Developing pupil test "awareness** 

hfakieg general comparisons 

Providing defease against public atUdsin 


45 

38 

35 

30 

23 

20 

13 

9 

9 

8 

8 

7 

6 


4 

3 


TaSU ). Resource Persons Used by Counties in Planning of Testing Programs 

Number of 

Resource Persons Otunties 

State department of education personnel 
Toting company representatives 
College or univenity persoooel 
School personnel from other counties 

Principals of local Khools ^ 

Psychologists from guidance ot welfare departments g 

State employment service 2 

Textbook publishing company cepresentalives j 


Table 4. Names and Frrjuendes of Mental Ability Tests Used 
in County Testing Progr am* 


Names of Tests 

Number of 
Counties 

Califoraia Test o{ hteota! Maturity 

Otis Quick-Scoring Test of Menu] AbUity 

Kuhl mao- finch 

30 

12 

Piatner-Cuanin^am 


Henmon-Selton T«t of MeatJ Al^ity 

Davis-Eells ' 

2 

Quago Non-Verbal 

1 

Kuhlaun-AnJersoa 

1 

1 


28 

23 

19 

10 


Testing in Generol and ia Counseling Program 25 

5. The following statement is from a pamphlet related to the use of 
tests in employment situations: 

A well-chosen psychological lest may be thought of as a written 
interview. The group test possesses the added advantage of lending 
itself to mass-production methods in the obtaining of information 
needed in the placement of ncw-hires and in appraising the 
desirability of transferring an unsuccessful or disgruntled em- 
ployee already on the job. 

To be a valuable tool in the hands of the employment inter- 
viewer, a test must tell what would be found out when the super- 
visors' ratings start coming in. Tests give this infornutioa in 
fewer minutes by the dock, sooner by the calendar, and at a lower 
per-employee cost. The group test is especially valuable in that it 
permits the simultaneous "interviewing” of 25 to 100 applicants, 
^us enormously reducing delays in getting ncw-hires on the job.” 

^hat assumptions regarding the efficacy of testing arc apparent 
in the statement.^ How does the viewpoint presented above con- 
‘ trast with that of individual counseling? Is the argument of time- 
saving tenable even in employee selection? Would you agree that 
"a well-chosen psychological test iiuy be thought of as a written 
interview''? 


References 

Cronbach, Lee J. Essentialt of Psychological Testing. New York: 
Harper, 1949, Chapter 11. 

Darley, John G., and Anderson. Gordon V. "The Functions of 
Measurement in Counseling.” In E. F. Lindguist (Ed.), Educa- 
tional Measurement. Washington, D.C: American Council on 
Education, 1951, pp- 68-84. 

Doppclt, Jerome E., and Bennett, George K. "Redudng the Cost of 
Training Sathfictory Workers by Using Tests.” Personnel Psj- 
chology, April, 1953, 6:1-8. 

Fortune. 'The Tests of Management” Fortune, July, 1950, 42.-92- 
96. 

Floyd Ruch. Uov to Uit EmplayrntMt Tent. Exnployfaear Toung Bulieua 

No. 1. LoiAagtln: OWonua T«l Eiueau^ 1944, p. 3. 



26 Meosuremeni for Guitlanee 

Jones, Arthur J. Principles of Guidance. New York: McGraw-Hill, 
1945 , Giapter 2. 

Mathewson, Robert H. Guidance Policy and Practice. New York: 
Harper, 1955. 

Rothney, John W. M., and Roens, Bert A. Counseling the Individual 
Student. lAcn York: Dtydea Press, 1949. 

Rothney, John W. M., and Roens, Bert A. Guidance of American 
Youth. Cambridge: Harvard University Press, 1950. 

Super, Donald E. Appraising Vocational Fitness. New York: Harper, 
1949 , Chapter 2. 

Super, Donald E. "Guidance: Manpower Utilization or Human De- 
velopment." Personnel and Guidance Journal, September, 1954, 
33:8-14. 

Thorndike, Robert L. Personnel Selection. New York: Wiley, 1949. 
Warters, Jane. High School Personnel Work Today. New York: 
McGtaw.Hill, 1956. Chapter 1. 

Whyte, William H., Jr. "The Fallacy of Personality Testing," 
Fortune, September, 1954, 50:117-119. 

Wolfe, Dael. Americds Resources of Specialized Talent. Report of 
the Commission on Human Resources and Advanced Training. 
New York: Harper, 1954. 



CHAPTER II 


Varieties and Sources of Tests 


Tests that purport to provide the answers to practically all the 
(Questions that counselors and their counselees are likely to raise 
have been published. Bcaminations of publishers’ catalogues re- 
veal hundreds of tests with titles inferring that measurement of 
aptitudes for, and performance in, common academic fields and 
vocational areas can be done successfully.' The bewildering array 
of titles of tests with similar content and designated purpose re« 
quires the counselor to develop standards to guide him in the selec- 
tion of those that he will use. 

TYPES OF TESTS 
CLASSIFICATION AND EMPHASIS 

In the following chapters some criteria to aid in selecting tests 
will be considered. Before those criteria are examined it will be 
necessary to appraise the common types of tests in terms of the 

1 Measures of interest and personality are not considered here since the instru- 
ments are not tests in the usual sense of the word They are questionnaires ai^ as 
Such will be treated separately in Chapter VIM. 

27 





26 Measurement for Guidance 

Jones, Arthur J. Principles of Guidance. New York: McGraw-Hill, 
1945, Chapter 2. 

Mathewson, Robert R Guidance Policy and Practice. New York; 
Harper, 1955. 

Rothney, John W. M., and Roens, Bert A. Counseling She Individual 
Student. New York: Dryden Press, 1949. 

Rothney, John W. M., and Roens, Bert A. Guidance of American 
Youth. Cambridge: Harvard University Press, 1950. 

Super, Donald E. Appraising Vocational Fitness. New York: Harper, 
1949, Chapter 2. 

Super, Donald E. ’’Guidance: Manpower Utilization or Human De- 
velopment." Personnel and Guidance Journal, September, 1954, 
53:8-14. 

Thorndike, Robert L. Personnel Selection. New York: Wiley, 1949. 
Waiters, Jane. High School Personnel ]Vork Today. New York; 
McGraw-HiU, 1956. Chapter 1 . 

Whyte, William R, Jr. "The Fallacy of Personality Testing." 

Fortune, September, 1954, 50:117-119. 

Wolfe, Dael. America's Resources of Specialized Talent. Report of 
the Commission on Human Resources and Advanced Training. 
New York: Harper, 1954. 



CHAPTER 


Varieties and Sources of Tests 


Tests that purport to provide the answers to practically all the 
questions that counselors and their counselees are likely to raise 
have been published. Examinations of publishers’ catalogues re- 
veal hundreds of tests with titles inferring that measurement of 
aptitudes for, and performance in, common academic fields and 
vocational areas can be done successfully.’ The bewildering array 
of titles of tests with similar content and designated purpose re- 
quires the counselor to develop standards to guide him in the selec- 
tion of those that he will use. 

TYPES OF TESTS 

CLASSIFICATION AND EMPHASIS 

In the following chapters some criteria to aid in selecting tests 
will be considered. Before those criteria are examined it wili be 
necessary to appraise the common types of tests in terms of the 

1 Measures of interest and peisooalitf ate not considered here since the instru- 
tnents ate not tests in the usual sense of the word. They are questionnaires and as. 
such will be treated separately in Chapter VIll. 

27 





28 McasuremcnJ for Guldonce 

methods of administering them, the materials used, the perform- 
ances demanded of the subject, and their purported use. 

With respect to advunhtTation of the tests, they may be gisen 
to groups or to individuals- 

Tests may produce one score oc a series of scores. Those that 
produce one score may be called unit tests and those that provide 
several may be described as a batter j. 

Tests may consist of minialures of larger tasks oc they truy oan- 
tain items from which a trait is inferred. 

With respect to the materials used in a test, Use items may be in 
verbal, nonverbal, oc apparatus form. 

The lime factor in testing may be represented by tests in which 
a definite time limit is prescribe or those ir\ which a work-limit 
procedure is used. In the latter case the subject is given as much 
time as he needs to complete Uie items. 

The performances demanded of the subject may test his speed 
In responding, his power within a comparatively narrow area, or 
his range of coverage in several areas. 

The extent of generalization about the subject’s test scores may 
range from a specipe report about a particular area such as spell* 
ing or a general indication of as broad a factor as general scholas- 
tic aptitude. 

The area measured by a particular lest may be described by the 
author as scholastic, mechanical, scUntipc, numerical, cultural, 
clerical, and so forth. 

The function measured may be variously labeled as achievement, 
aptitude, ability, information, propdency, etc. 

The number of combinations of the categories noted above are 
many and they increase as new tests appear and new tides are 
given to tests that cover the areas formerly covered by older tests. 
The schematic diagram of Figure 1 has been drawn up to indicate 
some of the possible combinations. Using the items in the diagram, 
it is possible to classify many of the common tests. Thus the Starv- 
ford-Binet may be described as an individual, unit, trait, verbal 



Extent of Ares 



Figure I. A Schematic Presentation of Common Types of Tests. 


30 Meosuremenf for Guldanco 

and mnverU, time limil, and power lest of general mental ability. 
The Stanford Ardiicvement Test, by contrast, may be dassidcd as 
a group, battery, trait, t'erbal and nonverbal, time limit, range test 
of specific scholastic achievements. 

IMPLICATIONS FOR SELECTION 

The analysis above was presented to indicate the variety of tests 
from which a counselor must choose. Thus if a counsclce were to 
raise a question about his fitness to undertake an apprenticeship 
in a me^anical field and if the counselor decided to use tests to 
assist him in answering that question, he would have to make de- 
cisions about such problems as these: 

1. Should I use a group or an individual test? 

2. Should I use a test the materials of which duplicate in minU' 
tuie the actual mechanical tasks he will be required to perform in 
the apprenticeship, or should I use a test that purports to measure 
general mechanical aptitude? 

3. Should I use a battery of tests (spatial relations, speed of 
manipulation, mechanical reasoning), or will a test that provides 
a single score be adequate? 

4. Should I use tests that require the subject to perform on a 
piece of mechanical apparatus? Can I get the answer that 1 seek 
by having the subject write his responses? Or shall I require him 
to manipulate symbols as in a spatial relations test? 

5. Should I let him take a test that permits him to use as much 
time as he needs to complete the tasks set, or should I choose a test 
with definite time limits? 

6. Should I require him to perform in several areas, or should I 
see how far he can go in selected specified areas? 

7. Should I be concerned with scores in many mechanical areas 
that can be averaged to ^ve a general mechanical score or should 
I test him intensively only in the special part of the mechanical 
held m which he has expressed interest? 



Varieties and Sources of Tests 31 

8. Should I ignore his performances in scholastic or other areas 
and concentrate entirely on the mechanical field? 

9. Should I be concerned primarily with the information he has 
gained in the mechanical area, with the kind of reasoning he does 
when presented with mechanical problems, or should I be con- 
cerned with his proficiency in peclocming certain mechanical tasks 
only? 

When the counselor has answered these questions he may begin 
looking for tests to fit the specifications set up by the answers to 
his questions. But in the process of answering them he will have 
been forced to consider many of the controversies about the value 
of various testing devices. Some of them ate presented in the fol- 
lowing paragraphs. 

(3ROUP vs. INDIVIDUAL TESTS 

In deciding whether he should use a group or an individual test 
he must weigh the advantages of saving time by testing several 
individuals simultaneously against the loss of the opportunity to 
observe a particular counselee closely while he is at work. He must 
consider the possibility that his subject might not put forth maxi- 
mum effort in a less closely supervised group-testing situation and 
the chance that he might be disturbed by the presence of others. In 
making the choice between individual and group tests he must 
study the reported difference in the yield of group and individual 
tests and try to determine whether the differences are worth the 
additional time the individual test requires. 

SINGLE SCORE VS. BATTERY TESTS 

When a counselor is trying to decide whether he will use a test 
that provides a single score or a battery of tests yielding several 
scores he will be forced again to consider the factors of time and 
yield. Certain tests labeled as tests of ability or aptitude or reason- 



32 Measurement for Guidance 

ing in mechanical areas purport to give, in a single score, a meas- 
ure o£ what the test has been labeled. This practice is continued 
despite the fact that investigators have shown rather conclusively 
that it would be better to talk about mechanical abilities rather 
than mechanical ability. Such evidence suggests that he wiU be 
forced to use a battery of tests rather than a single test. Since, 
however, the battery will be more expensive in time and money the 
counselor must weigh relative contributions of unit tests and bat- 
teries in his decision, 

MINIATURE vs, TRAIT TESTS 

As he proceeds in the selection of bis tests, the counselor may 
have to make a choice between miniature and trait tests. For a 
counselee who is considecing a career as a machinist the counselor 
may, for example, use a two-hand coordination test in a setup 
similar to a lathe or he may ask a set of questions designed to 
determine whether his subject has become familiar with the prin- 
dples involved in lathes and similar machines. In making this 
choice he will be faced with the problem of weighing the cost of 
a potential increase in yield obtained by using one of the methods. 
He must also consider whether his miniature can ever fully repre- 
sent the whole and whether or not he wants to approach his prob- 
lem from as narrow an approach as the use of miniatures may 
require. 

RECOGNITION vs. DEMONSTRATION TESTS 

Ihis next problem relates closely to the one in the paragraph 
above. Shall he use a picture-type test in which the subject is asked 
to show his facility in interpreting pictorial representations of cer- 
tain mechanical operations? Shall he use the spatial relations type 
of test in which his counselee is required to recognize certain de- 
signs when they are turned in difTerent directions or to recognize 



Varieties and Sources of Tests 33 

the kinds of designs that may be produced from several component 
parts? Or shall he use a piece of apparatus, miniature or otherwise, 
on which his counselee will demonstrate his facility? He will need 
to examine the research concerning the relative contribution of 
each of these kinds of tests to determine which of them, separately 
or in combination, best predict the future performances of subjects 
in the mechanical held. He will find little evidence of the kind he 
seeks. 

As he continues to seek answers to the remaining questions he 
will, of course, find some overlap with those to which he has 
sought answers previously. In trying to decide between work-limit 
and time-limit tests the merits and limitations of p3/~h of these test- 
ing methods must be appraised. He will find advocates of both 
methods and his examination of research on this problem will 
probably not result in d.ear-cut answers to his questions. 

GENERALITY VS. SPECIFICnY OF TEST COVERAGE 

During the time that the counselor has been seeking answers 
to the above questions he will have been concerned with the gen- 
erality or specificity of the coverage of the area that he wishes to 
cover. He may wish to know whether the counselee is likely to 
succeed in any of several mechanical tasks or, in rare cases, he may 
be concerned with both. In general he will discover that the tend- 
ency in testing is to turn away from global and to seek specific 
scores. He will also find that, in industry, the trend is toward se- 
lecting men who are generally proficient and who may be taught 
a specialty after they have been empIoy«l. 

TEST TITLES VS. TEST CONTENT 

Finally, the counselor will look at all possible tests in terms of 
what their labels indicate they purport to measure. He will find 
tests that are labeled as measures of mechanical reasoning, me- 



34 Measurement for Guidance 

completion of analogies may be a part of a verbal aptitude test m 
one case but part of a general mental ability test on the other. 

The exercise above should have revealed to the reader that he 
must be concerned with test items rather than with test titles. If 
he is still skeptical he should try still another exercise. He should 
arrange to have someone take a sampling of various tests and fold 
dieit covets back so that the reader cannot see the title. The reader 
may then study the items and guess what the title might be. He 
will find little correlation between the titles guessed and the titles 
given. The title on the cover of a test gives little indication of its 
content. 


SOURCES OF TEST MATERIALS 

Now that the counselor. has noted the various kinds of tests 
available and has seen some of the problems he will meet in mak* 
ing choices among them, he may inquire about the sources of mate* 
rials. In the following pages some samples of basic sources are 
called to the attention of the counselor. After he has considered 
them he may then turn to references in which the names o£ pub* 
Ushers of specific tests are given.** * 

NONPROFIT TESTING AGENQES 


The Educational Testing Service of 20 Nassau Street, Princeton, 
New Jersey, is sponsored by the American Council on Education. 
The council is composed of national education associations, univer- 
sities, college, technological schools, private secondary schools, 
city school systems, state departments of education, and other edu- 
cational groups. It is a center of a>6peration and coordination 
whose influence has been apparent in the shaping of American 

N.J.: 

y:,£ '■>' j- 



Varieties ond Sources of Tesfs 37 

educational policies and the formation of educational practices for 
nearly forty years. Its committees ate composed of famous persons 
in American education including presidents of universities and 
colleges of high standing and, ex officio, the United States Com- 
missioner of Education. 

Within recent years the Educational Testing Service has, under 
the sponsorship of the Council, carried on the testing activities 
{oimeily done separately by the College Entrance Examination 
Board, the Carnegie Foundation for the Advancement of Teach- 
ing, and the Cooperative Test Service. It was organized to serve 
education by developing new areas in which tests were needed, by 
constructing and administering testing programs for various edu- 
cational and government purposes, by conducting research for the 
purpose of advancing test theory and practice, and by providing 
advisory services to schools and colleges. It has provided testing 
programs for many government and private scholarship programs 
and has prepared batteries of tests that are used as admission and 
evaluation instruments for many colleges and graduate schools.* 

The counselor should become acquainted with the Educational 
Testing Service. Its nonproht basis and its high quality of research 
and offerings demand that one consider its services before selection 
of tests for any purpose is made. 

The Educational Records Bureau, 21 Audubon Avenue New 
York 32, New York, also sponsored by the American Council on 
Education, has maintained high standards of testing for large num- 
bers of private and public schools on a nonprofit basis. It has 
maintained a test research staff and its publications contain many 
reports of tryouts of commeidally published tests.® Although the 
recent advocacy by the Bureau of some questionable interest inven- 
tories seems contrary to the hi^ standards it has maintained, its 
offerings are generally excellent Specific information about the 

* The Annual Reports to the Board of Trustees de^jfee the activities of STS ia 
detail. Literature concerning the ofieciogs may be rfauiaed on request. 

® Bulletins of the Educational Records Bureau published annually. 



38 Measurement for Guidance 

Bureau’s services may be obtained at the address given above. No 
counselor can afford to be unfamiliac with its offerings. 

COMMERCIAL TEST PUBLISHERS 

The test publishers who sell tests for profit may be grouped into 
four categories. Since there are many publishers in each of three 
of the categories, only samples of names ate presented. Complete 
lists may be seen in Buros’ Mental Measurement Y earhooks men- 
tioned previously. 

The first category includes commercial test publishers whose 
primary purpose is the sale and distribution of tests. These organi- 
zations are not sponsored not supervised by professional organiza- 
tions, regardless of the titles they have chosen for their business. 
They are maintained and continue to operate on the basis of profits 
obtained from the sale of tests. Some of the organizations in this 
category ate: 

Science Research Associates. 57 West Grand Avenue, Chicago 
10, Illinois. 

California Test Bureau. Head office, 5916 Hollywood 'Avenue, 
Los Angeles 28, California. 

Public School Publishing Co. Bloomington, Illinois. 

Educational Test Bureau. 720 Washington Ave. S.E., Minne- 
apolis, Miiuiesota. 

Committee on Diagnostic Reading Tests. Kingscote, Apt. 39, 
419 West U9th Street, New York 27, New York. 

A second category includes several commercial companies whose 
primary concern is the publishing of books and who also distiibate 
tests and related materials. Most of these organizations are mem- 
bers of The American Textbook Publishers’ Institute and such 
membership offers a preliminary screening for the counselor who 
is seeking quality products. Many of these publishers have estab- 
lished enviable reputations in their fields over a long period of 



Varieties and Sources of Tests 39 

time and are not likely, if they can avoid it, to jeopardiae that repu- 
tation by offering unsatisfactory tests for sale. Unfortunately it has 
not always been possible to avoid this situation. The counselor 
should, however, examine the offerings of such companies as the 
following when he begins to select tests: 

Houghton Mifflin Co. 2 Park Street, Boston 7, Massachusetts. 

World Book Co. Yonkers, New York. 

The third group includes university presses that publish a num- 
ber of psychological tests and diagnostic devices. They are, of 
course, under the control of universities of high international re- 
pute and their editorial responsibility is at a high level. Again, 
some of these organizations produce tests and inventories that may 
seem of doubtful value to the counselor. Acceptance of the high 
ethial standards of the publisher does not imply complete accept- 
ance of their test materials not the theories on which th^ are 
based. Examination of the oSetings of such organizations as those 
noted below, and indeed of any of the university presses, is recom- 
mended to counselors. 

Stanford University Press. Stanford, California. 

Harvard University Press. Cambridge 38, Massachusetts. 

Bureau of Publications, Teachers College. Columbia University, 
New York 27, New York. 

A fourth category is added because one major commercial source 
does not clearly fall in the first three. The Psychological Corpora- 
tion, 304 E. 45th Street, New York 17, New York, is a special case. 

It was organized some 25 years ago to provide instruments and 
techniques developed by the psychological profession. Its board 
of directors, officers, and staff consists of members of the Amer- 
ican Psychological Association and the ownership of its stock is 
r«tricted by its charter to members of that association. It is com- 
posed of divisions devoted to market and social research, indus- 
trial, clinical, and professional examinations and tests. Its Test 



4Q Meosurement for Guldonce 

Service Bulletins,* which nay he obtained without cost, ate models 
o£ a professional service that can be used by counselors. The direc- 
tion of The Psychological Corporation’s efforts by members of the 
American Psychological Association suggests that high ethical 
standards ate mabtained. It is not suggested, however, that the 
counselor must agree with their basic thinking on measurement or 
accept without question all the materials they produce. 

TESTS USED BY AGENaES OF GOVERNMENT 

Under this heading the counselor will note such tests as the 
Selective Service College Qualification Test, the General Aptitude 
Test Battery of the United States Employment Service, and perhaps 
the state-wide testing services. In the case of rite first of these he 
will not have any voice in the constiuctioa or administration of 
the tests but couoselees may consult him about times at which the 
tests are to be taken and about the interpretations and use of 
scores. 

The General Aptitude Test Battery, consisting of 12 subtests 
and designed to measure aptitudes said to be necessary in certain 
occupational areas, is administered and scored by State Employ- 
ment Services. In many schools, leprescntatives of the agency seek 
to admimster the battery to high school seniors who plan to enter 
employment immediately after graduation. They will interpret the 
scores to such volunteers and make them available to school coun- 
selors in specific cases. 

The counselor who plans to use such services must first seek an- 
swers to several questions and make certain decisions about testing 
policies. He may find, for example, a tendency for economy-minded 
and antiguidance school personnel to proclaim that they need not 
be concerned with guidance since the employment services will do 
what is necessary. He must ask whether these tests, coming in the 




Varieties and Sources of Tests 41 

senior year, are administered too late to be of much value. He 
must decide if he wants a nonschool agency to detetmine the kinds 
of tests that would be most suitable for his counselees. He must 
consider whether the cooperation of the school with employment 
services and employers in the use of tests will not result in better 
relationships among all three agendes. And, Bmlly, he must dedde 
whether tests designed for me in selection may be useful in coun- 
seling. 

On this latter point the evidence is not dear. One author * has 
pointed out the following difficulties in use of the test: There is 
no intelligible guidebook for the interpretation of test results; na- 
tional norms for high school seniors by sex that would permit com- 
parison of a senior's score with a dearly defined group to which be 
belongs are not available; percentile norms of young high school 
graduates in various occupations are not provided; reliabiliQr data 
on the test when it is used with high school students and probable 
errors of measurement for various groups are not offiered; data on 
predictive validity in terms of comparisons of performances of 
high school seniors with well-defined criteria of job success after 
employment are not given. At least one follow-up study of the 
test battery * has shown no significant differences in job satisfaction 
of a group of young workers who had taken the tests and received 
interpretations of their scores while in high school and a matched 
group that had not. 

In one sense the problems and cjuestions raised above reflect 
those that are cQ f^m nnly met when the counselor becomes involved 
in testing programs in which he cannot select his own tests, set up 
the rules for the testing and interpreting the scores to his coun- 
selees, and tailor the measurement program to particular individ- 
uals. There are, of course, some compensations. He is, in general, 

»T. E. Christensoa. ••Helping Students Enter Industry." VocaJioiud Guiiance 
Quarierly, Autumn, 1954, pp. 24—26. 

< Cirl Ttieger. ’•Effectiveness of the United States EmpIt>rn5eot Service General 
Aptitude Test Battery in Employiacat C2>unseiiag of High School Seniors." Un- 
published Ph-D. Thesis. Madison: Uoivetsi^ of Wisconsin, 1955. 



42 Measurement for Guidance 

likely to get bettei tests at less cost than he may be able to procure 
from commercial test publishers. In state or regional programs, “ 
usually centered at state universities, he is likely to get norms on 
subjects that are similar to his counselees, and he may have the 
advantage o£ participating in a professional approach to test de- 
velopment, study of fresh materials, and examination of new tech- 
niques. No counselor can overlook the offerings of such agencies 
•when he plans his measurement program. 

LOCALLY CONSTRUCTED TESTS 

G>unseIors who are employed in educational institutions where 
there are large enough enrollments to permit development of satis- 
factory norms may find it to their advantage to use locally devised 
tests. The obvious limitatiocis presented by using local materials 
for an essentially mobile population do, however, limit the value 
of this source except in unusual circumstances. As indicated later 
in this volume, the use of locally constructed achievement tests may 
be satisfactory when current, but not necessarily predictive, evi- 
dence about a student's performance is needed. 

CHECKINS ON THE TEST PUBLISHER 


Regardless of the sources to which a counselor goes to get his 
tests, he should apply certain criteria in judging their merits. Ap- 
plication of these criteria will be illustrated for particular tests in 
Chapters IV and V. The following six criteria may be listed at this 
point. 


1. Have the publishers done enough research to demonstrate 
that Uk merits they claim for the test arc valid? 

2. Do the publishers limit the claims for the value of their tests 
to what can be demonstrated by research and usage? (Tliis point 


CoorJwutal Regioful T«tiog Pcogruw ia Hii 
Sdioot. .Scu- Dirttiien fer MrMMmim jnJ CniJj*et n •! 

EJw*Uon.S<r,e»l,No.;o. IWipp-OT-lOJ. Council . 



Varieties and Sources of Tests 43 

is particularly important to counselors. Many authors of tests state 
that they may be used for educational and vocational guidance of 
individuals but provide data that refer only to use with groups,) 

3. Do the publishers continue to assume responsibility for a test 
after its publication? Do they provide revisions and improvements 
when their need is indicated? 

4. Do publishers actually limit the sale or distribution of tests 
to qualified users? Statements of policies with respect to this matter 
have been published in The American Psychologist, August, 1946, 
1, No. 8, pp. 353-357. 

5- Do publishers provide professional services to test users? 
Some companies give titles to their salesmen which imply that they 
are consultaots. Guid^sice personnel should consider the dehnition 
of the word "consultant’' before they seek counsel on the purchase 
of tests. 

6. Is the publishecs' advertising ethical and dignified? In profes- 
sional fields there is common agreement that advertising must meet 
such criteria. 'The counselor will look with as much suspicion on 
test publishers who resort to high-pressure advertising of testing 
programs to meet all his needs in a neat package as he would on 
the advertising of a patent medicine that is supposed to cure all 
his physical ills. 


SUMMARY 

In this chapter some of the problems of selecting a test from a 
wide range of types and forms have been presented. Tests have 
been classified with respect to methods of administration, the items 
they contain, the performances demanded of the subjects, the 
scores they produce, and their purported use. It has been shown 
that selection of the test or tests that might be most useful from 
among the many available is a complicated task. It requires exami- 
nation of much research and the weighing of many kinds of con- 
flicting evidence. It has been demonstrated that test labels are 



Meaturcmcnf for Guidance 

generally misleading and tliat tlic counselor must be concerned 
with more than a test title. Sources of tests have also been indi- 
cated. In the following cliapter the processes involved in the selec- 
tion of a test, regardless of the title it carries, will be considered. 

Discussion Questions and Exeroses 

1. Select several tests with essentially Uic same title or purportedly 
measuring suclj things as medunical aptitude and classify cadi 
according to Uic descriptive terms suggested in tliis chapter. How 
do they compare? How would you account for the dilTcrenccs of 
apptoadi? Which of tliosc you have anaj)-acd do you Uiink repre- 
sents the soundest approach, considering die ruturc of tlic area to 
be tested and the stat<^ purposes of the tests? Why? 

2. Using the tests selatcd for the exercise above, examine them for 
t}pes of items used, e.g., pulleys, tool identification, spatial rela- 
tions, etc. How do the tests compare in terms of t)’pc$ of items 
used? What t)pc$ of items, if any, are common to each of the 
tests? What types of items arc peculiar to a spedSe test? Are these 
consistent with die stated purposes of the test? How would you 
account for the differences? Is there evidence in the test manuals 
to suggest that the tests ate ccjually effective in spite of die dif- 
ferences in test items? To what extent do the test authors defend 
the inclusion of dinr items? 

3. As a class project, examine as many tests as you can from your 
speamcn library and compile a Ikt of trails, skills, achievement 
areas, personal characteristics, and so fortli. Place this list in the 
right-hand column of a table such as Table 5. Provide a further 
column for each of the major test publishers. On the basts of data 
obtained from their otalogucs and oUicr publicUions, place a 
check mark in the appropriate line and column whenever a pub- 
lisher offers a test or subicst score appropriate for use at die high 
school level which corresponds to the trait, skill, etc. listed in the 
first column. The partial tabl^ presented as an example, suggests 
that publishing companies A, C. D, E, G. and H all offer a test 
designed to measure mechanical aptitude. 



Varieties and Sources of Tests 45 

TABtE 5. Suivey of Special Tests and Publishers 


Trai^ Skill, 






■ 

Achievement 


TmtPublitbing Company 



Area, etc. 

A 

B C 

D 

B F 

G 

H 

Mechanical 







aptitude 

Neurotic 

X 

X 

X 

X 

X 

* 

tendeucy 

Clerical 


X 


X 


* 

aptitude 

Reading 

X 

X 

X 

X 

X 

X 

speed 


X 

X 

* 

X 

X 


Gin you think of any areas of performance or behavior that are 
not represented by the list of traits, etc., presented in the first 
colunan of your table? What areas of measurement appear most 
often among the test publishers? Are there any areas of measure- 
ment represented on which you believe it would not be desirable 
to have performance data? Would you regard it as desirable to 
give a student an extended battery of tests representing these areas 
listed in the first column? Why or why not? What publishers 
appear to provide the widest range of measurement? How would 
you account for the duplication in the offerings among the pub- 
lishers? 

4. A description of the job of cabinetmaker might include the fol- 
lowing statements regarding the nature of the work; 

Studies work orders, drawing, blueprints or other specifications; 
measures with such instruments as calipers, scale, square; may use 
hand tools such as ripsaw, rabbet plane, files, hand plane; uses 
such woodworking machines as circular ripsaw, tenoner; turns 
round parts to desired diameter on a lathe; forms such wood joktts 
as butt joint, miter joint, mortise and tenon, and lap joints; checks 
vertical and horiaontal troeness with carpenter’s level; may apply 
oil, stain, or polish to complete articles; installs hardware such as 
hinges; estimates job costs; makes sketches or drivings of work to 
be done.*® 

10 Adapted from "Job Descripuoft for Cabhetauker." Guide series, 

Occupational Analysis and IndusUial Services Dhision, U.S. Employment Service, 



46 


Measurement for Guidance 

For what kind of evidence would you look If you were counsel- 
ing with a student who planned to enter training for this job? 
What types of tests would provide tlic best measurement in the 
areas suggested by tlie statements? Can you hnd examples of 
specific tests that would provide appropriate measurement? What 
evidence of the appropriateness is offered in the manuals? Do any 
of the manuals specifically mention the prediction of success in 
cabinetmaking or related occupations? 

5. Evaluate one test-publishing company in terms of the six criteria 
presented on pages 42-4J. Use the publisher’s catalogue, copies of 
test manuals, advertising materials, and any other available sources 
of information. You may wish to compare your findings with 
others in the class who have selected the same publisher and with 
those who have selected other publishers. Do the publishers re- 
viewed rate about e<{ually well? Does there appear to be any 
particular area of weakness? Which of the criteria would you 
regard as most important? How do the publishers reviewed by the 
class rank on the basis of this one aiterion? 

TESTS FROM WHICH THE ITEMS ON PACES 34-35 WERE TAKEN. 

1. Stanfocd-Blcvet InuUt^ence Test 

2. Progressive Achievement Test 

3. Numeiial Ahility section of the Differential Aptitude Tests 

4. Progressive Achievement Test 

5. Number section of the tests of primary Mental Abilities 

6. Language usage section of the Differential Aptitude Tests 

7. Stanford-Binet Intelligence Test 

8. Progressive Achievement Test 

9. Language usage section of the Differential Aptitude Tests 

10. Henmon-Nelson Test of Alentaf AWfity 

11. Verbal reasoning section of the Differential Aptitude Tests 

12. Mechanical reasoning section of the Differential Aptitude Tests 

13. McQuarrie Test of Medianical Ai/tty 

14. Henmon-Nelson Test of MentA Ability 

15. Bennett StenographicA^/zte^eTest 

yi nZTCre';.,™™, OEC.. 



Variefles and Sources of Tests 


47 


References 

American Educational Research Association. "Psychological Tests and 
Their Uses.” Review of Educational Research, February, 1947, 17. 

American Educational Research Association. Review of Educational 
Research, February, 1953, 23. 

American Psychological Association. Technical Recommendations for 
Psychological Tests and Diagnostic Techniques. Joint Conunitfee 
of the American Psychological Association, Aroesian Educational 
Research Association, and National Gnmcil on Measurements used 
in Education. Supplement to Psychological Bulletin, March, 1954, 
51. Washington, D.C.: Amedan Psychological Association, 1954. 

Anastasi, Anne. Psychological Testing. New York; Macmillan, 1954. 

Buros, O. K. The Pourtb Mental Measurements Yearbook. Highland 
Park, N.J.: Gryphon Press, 1953. 

Cronbach, Lee J. Essentials of Psychological Testing. New York: 
Harper, 1949. 

Freeman, Frank S. Theory and Practice of Psychological Testing. 
Revised Edition. New York: Henry Holt, 1955. 

Goodenough, Florence. Mental Testing. New York: Rinehart, 1949. 

Hildreth, Gertrude H. A Bibliography of AUntal Tests and Rating 
Scales. 1945. Supplement. New York: The Psychological Corpora- 
tion, 1946. 

Kirk, Barbara A. "Test Distributors and Our Needs.” Occupations, 
January, 1951, 29:257—259. 

Laitin, Yale J. "Why Publishers’ Representotives Were Born.” Occth 
pations, January, 1951, 29:260-263. 

National Education Association. Technical Recommendations for 
Achievement Tests. Report of Committees on Test Standards of 
the American Educational Research Association and the National 
Council on Measurements Used in Education. Washington, D.C.: 
National Education Association, 1955. 

Super, Donald E. Appraising Vocational Fitness. New York: IlarjKT, 
1949. 

Traxler, Arthur E. Techniques of Guidance. Revised Edition. New 
York: Harper, 1957. 



CHAPTER I 


Criteria of Test Selection 


The publiation in 1954 by the American Psychological Asso- 
ciation of its bulletin on technical recommendations * for use of 
psychological tests and diagnostic procedures was an important 
event in the field of measurement. The very fact that the publishers 
thought that the recommendations were necessary implied, and In 
some cases the comments suggested specifically, that there had 
been some serious shortcomings in tests and in reports about them. 
The bulletin reviewed common weaknesses in descriptions of tests 
and suggested many ways in which the shortcomings might be 
avoided. Thorough study of this bulletin is an obligation of all 
those who produce, distribute, and use tests but, as the authors 
have pointed out, the publication of adequate information about 
them does tiot guarantee that tests will be used wisely or well. In 
this chapter, the reader will find suggestions for better selection of 
testing materials, and he will also be introduced to some of the 
problems that counselors meet in their use and interpretation. 

» Te^ciJ ^mxnwdAiioM for P*yd>oto^ Toto irv<i DugoosUc Techniques. 
rsjcboloiieal BkUtiit SuppUmrMl, March, 19J4, 

48 


Criteria of Test Selection 


49 


VALIDITY 

One of the interesting phenomena in the field of education has 
been the widespread use of tests despite the fact that most oftheir. 
rnant^la, cont ai n ed no-adecpiate evidence- diat- the tests-could- ac- 
complish what-tbey purported to do. When one refers to the ques- 
tion whether a test does what its authors claim that it can do 
(assess mechanical aptitude, determine readiness for reading, pre- 
dict success in college, etc.) the term validity is used. In a sense 
the choice of this generally useful word in a technical sense is most 
unfortunate. A valid signature on a check may actually be the sig- 
nature of the person but it does not guarantee that the check will 
be backed by sufficient funds. The name on a bottle of medicine 
may be the name of the product given to it by its makets but the 
medicine may not actually do what the manufacturers claim for it. 
It is conceivable that a test may be a valid measure of, say, me- 
chanical reasoning, but it may fail to accomplish what the authors 
claim for it, that it will select the persons who ate most likely to 
succeed in mechanical training or occupations. The counselor must 
always seek answers to such questions as; Valid for what.^ Valid 
for whom.^ and, Valid under what dtcumstaaces? And he will ex- 
amine the evidence about the validity of a test with such questions 
in mind. 

Until recently the general term "validity" has been used without 
any qualifying adjective. It is now becoming more common, after 
the urging of the committee that published the technical recom- 
mendations for use of psychological tests, to add an adjective be- 
fore the word to indicate the kinds of evidence that are offered to 
substantiate the claim of validity. Currently the adjectives most 
commonly used are contenfj predictive, co ncurrent construct. 
Brief descripions of the conditions under which counselors will be 
concerned with these four kinds of validity and of the ways in 
which test builders procure data about them are presented below. 



50 


Measurement for Guidance 


CONTENT VALroiTY 

A teacher may want to know the current level at which a student 
can perform on test materials drawn from subject fields in which 
he has been given instruction. Before he uses a test for this purpose 
he will need assurancejhat the items cover completely, or provide, 
adequate samples of, the subject mattec about which .the conclu- 
sions are to be drawn. Thus if there is incontrovertible evidence 
that the items cover all the material that instructors have tried to 
teach their students to use in the way they ate trying to_ get them 
to use it (and the school board members and parents in the com- 
■“munity'agtee that the materials and the methods of using them are 
what they want for their children) the test will have content v a’ 
lidhy. If the test has been reduced in length by using samples of 
materials covered it will be necessary to have evidence that the 
sample is q ualttativel y_repf.gseatative.and.nume ficall v_adequate. If 
such evidence is presented, the content validity will be satisfactory 
and, providing other criteria of test construction and administra- 
tion to be described later have been met, the scores may be used 
wito confidence as a measure. of a student’s f«rrenr_petformance 
in.. toe area in w hich he was tested. 

PREDICTIVE VALIDITY 

The last sentence in the previous paragraph should be read 
again and particular attention should be paid to the word current. 
If tests with high content validity were available, and current per- 
formances always predicted future performances without substan- 
tial etcoc, counselors would find, themselves well equipped with 
tools to aid their clients in the dioice of future academic and voca- 
tional activities. There is a great deal of evidence, however, which 
indicates that current tests of performances do not do so. It will 



Criteria of Test Selection 51 

be necessary, if the counselor is to attempt prediction ' of future 
performances of a counselee by means of test scores, to have some 
evidence of the predictive validity (in a sense the forecasting effi- 
ciency) of the instruments. If the tests are designed to measure 
rnecha^al ^^tude, and if a youth’s scores on them are to be used 
as one oFthe sources of evidence about likelihood of success in 
a mechanical occupation at some time subsequent to that at which 
he took the tests, t here must b e,some eviden ce of the relationshi ps 
between scores on tlw test and future perfor man ces immechanical 
work. If so-called intelligence or scholastic aptitude tests are to be 
used as partial indicators of later achievement in college there 
must be some evidence that the test scores do forecast degrees of 
success in college. 

When a counselor looks for predictive validity in a test he is 
not necessarily concerned with the content of the test or the form 
of the items, although it should be noted that, for successful inter- 
pretation of tests to subjects, it seems desirable to have some con- 
tent validity. It is difficult, for example, to get a parent who is a 
mechanic to accept, as evidence of his son’s fitness for mechanical 
work, the scores on a mechanical reasoning test that contains pic- 
ture items of race tracks, billiard tables, and children on swings, 
as predictively valid as these items may be. It is theoretically possi- 
ble that the speed with which a candidate for air corps pilot train- 
ing ran a mile might predict later success in flight training, that 
achievement in spel ling might predict future per formance in ac- 
Mimting, of that^il in translati^ artifcklj^guage might P» 
3iS~eventinU--5uK5rSr5iatheiMtig. And if there was evidence 
that they did so thq" might wdl be used as tests with high pre- 
dictive validity in those areas. The point h ere is that Aerejnusl 
be evidence of some relationship between scores on tests and later 

* The argument about whether or not counselors should predict need not be con- 
sidered here. In effect, whenever a counselor and counselee work out a plan of 
action together and agree to move to the CMCUt/on of the plan there is an implied 
assumption that it will work out well — in a sense a prediction that the best possible 
plan has been chosen. 



52 


Measuremeat for Guidance 
pet£ormancc-on_SQme. criterion if the test scores are to be used to 
predict later performance. It is important that counselors recog^ 
nize the fact that predictive validi^ is not a necesary component 
of content validity or vice versa. Tests that sample current per- 
formance very effectively may fail completely to predict future 
performance. Failure to recognize this fact is probably one of the 
chief reasons why there has been so much misuse of tets. Since 
predictive validity is important in the use of tets in counseling, 
it will be given further consideration in the section on interpreting 
tet score in Chapters VI and VII. 

CONCURRENT VAUDITY 


Because it can be most readily secuied, infoimation about the 
concurrmt validity of tests is most commonly found in test man- 
uals. It is a relatively simpli matter to give an achievement test to 
a group of students in any 6eld, to collect students’ marks that 
putpoit to be evidence of achievement in it, to compute the cnrr,^_ 
e°n ^een ma rk,^ndjestJC0ra_an^ draw the inference 
correlation bawSriESkTSTd' 
^. scores a „ u practi^TTr^iSpilrae 

es made on a n™ly consinicted mental ability test with those 
md to conclude, tf *= coeffident of correlation is fairly high 

.thatthe new .estssavalidmeasureof what theothertestmL^^^^ 

;o iushf “ “«P-e by test aud.;rs“:::::“dn n 
marU do^LTesr"' '“HT 

-ptndingume rr::: f 

ready available for all b^a c 
<.-ntx.s.oseefai,,yH,gH„,J^/;,7J^j'^^^^^^ 



Criteria of Test Selection S3 

offered as evidence of validity but it is also strange to see such 
findings followed by suggestions that "subjective” marks now be 
replaced with "objective” test scores, or that a new test replace 
an old one simply because the new test scores correlate well with 
the one that was formerly used. 

It appears then that counselors will not be particularly con- 
cerned with concurrent validity. A statement in a test manual that 
a correlation coefficient of .47 between scores on a mechanical 
aptitude test and current proficiency records of women operators 
of gum-wrapping machines is not likely to be useful to a counselor 
working with a boy who is trying to decide whether he should 
enter a mechanical occupation.* 

CONSTRUCT VALIDITY 

The committee of the American Psychological Association that 
prepared the technical recommendations for use of the psycholog- 
ical tests and diagnostic techniques indicated that construct validity 
is ordinarily reported when the tester has no definitive criferion 
i^asure of guahcy tnarhlTtheo ^ w ^^and must use indirect 
measures t o validate the theory. In construct validity the trait or 
quality underlying the test is of central importance rather than 
the scores on a criterion. The concept of construct validity is vastly 
different from that of predictive validity in which the criterion is 
of utmost performance. The items on a test with predictive validity 
may even seem to an observer to have nothing in common with the 
criterion. 

Construct validity is of particular concern, to those who attempt 
t o measure person alitv_aad-who, in the process, meet thg probkm 
o j the lacl^ o f rela tionship between the personality traits they pro- 
p pse to m easure and the o vert behavio r of individuals. Evidence of 
construct validity is commonly presented by those who attempt to 

»T. W. MacQuarrie. MacQuarrie Itit for Mechanical Ability. Manual. Los 
Angeles: Cahloraia Test Bureau, 1953, p. X 



54 Measurement for Guidance 

inmitory counselees’ interests, -ni gr depend3 i_constructvaUdity. 
when Aey infei^ that individuals have cectaln vocational interests 
whoi their patterns of scores on an interest inventory, aie.similar 
to the scores made by member^of an occupational group. Predic- 
tive validity in such cases is haid'to procure because many individ- 
uals neither profess not exhibit such interests in any situation other 
than the one in which they fill out the inventory. 

If one is to depend on construct validity in devising a test, it is 
essential that the theory underlying the test be clearly, defined, that 
the tester show how he proposes to interpret the testee’s behavior, 
demonstrate' how adequately he believes that his interpretation is 
justified, and demonstrate clearly the evidence and reasoning that 
have^led him to that belief. Thus it may be theorized that persons 
in certain occupations have developed certain patterns of interest 
that are common to their group. Distinctions in the inventory pat* 
terns between such persons and all other groups would then have 
to be demonstrated beyond any reasonable doubt. Such theorizing 
and demonstration, it must be noted, could be well and thoroughly 
done but their completion would not necessarily mean that the 
patterns revealed by the interest inventory would be useful in the 
counseling of a youth who was in the process of choosing an occu- 
pation or training for it. To make the inventory scores useful to 
the counselor it would be necessary to provide data concerning the 
development of interest patterns by members of an occupational 
group, data on the consistency of their interests before entering 
the occupation or training for it, their interest patterns during 
training and early occupational experiences, and finally for a long 
period of work in the occupation. 

It appears thwi that, for U»c counselor, evidence of the construct 
validity of a test will not be enough. He will look for the theoreti- 
cal constructs behind any test, hope that they ate sound, and look 
carefully at the evidence. He will then demand that some evidence 
of predictive validity be presented. 



Criteria of Test Seleefioo 
VALIDITY FOR COUNSELING 


55 


GROUP VS. INDIVIDUAL VAUDITY 


There will always be, for counselors, the special problem raised 
by the fact that a test m ay be hi^lj^valid for grou ps in terms of 
t he fou r categories describi^ abo ve and y etbe-invalid for (he par- 
ticular individual 1^ is working wiA aMhe time. Girrelation co- 
efliSents present the overall picture and the expectancy tables 
described in Chapters VI and VII give odds on the average, in 
general, on the whole, and other things being equal. The counselee 
may be glad to see the general relationships that they indicate, but 
then he may ask what the data mean for him. No tests provide the 
answer to his question. The best they can offer arc generalities, 
od dsy d ia^es, and pcobabiliti^. 

The coefficients usually presented as evidence of validity hide 
the fact that there may be many cases within a distribution in 
which the relationships between test scores and criteria may run 
counter to the general trend. Within certain subgroups of a large 
population the relationships may be much higher than the coeffi- 
cients suggest. The general conditions under which the test was 
given may have influenced enough of the scores of enough of the 
subjects to indicate a general trend. They may not, however, have 
influenced the score of a particular counselee. 

The counselor who is aware of the difficulties involved in the 
application of general findings to particular cases will realize that 
a coefficient of validity less than I.OO (and most of them are far 
below that figure) presents difficulties in use for counseling the 
individual except in terms of the odds, probabilities, and chances 
described in Chapters VI and VII. 

TEST INTERPRETATION FRO M XMPUE P VALIDITY 

Probably nothing has caused more mischief in the field of test- 



55 Measurement for Guidance 

ing than thenawe mteipietatioo^ofj^ali^ty-daUJasted below are 
some samples from among many of the misuse of tests of quesdon- 
able validity submitted by schools as evidence of the way they use 
tests and inventories efFectiveiy in their guidance programs. 

This test [The Otis Test] shows that in high ^ool 

competiuon you should earn an avera^ grade of or higher; 

never a lower grade than the one indicate. If you do earn a lower 
grade than the one recorded above you ate underachieving and some- 
one with less ability worked and traded grades with you. 

Another school gives this report to a student: You should prepare 
for an occupation that follows oi>e of the interests [areas from the 
Kuder Preference Record are given], preferably the first choice. You 

should plan to train in [specific occupation is cited] 

as evidenced by results of other tests m out files. 


If these were only Isolated cases the problem would be bad 
enough but examination of reports on the use of tests in counseling 
suggests that they are all too common. And it seems that this kind 
of interpretation is due in no small measure to the overenthusiastic, 
exaggerated, and insufiicienlly cpjalified statements in test manuals. 
Osnsider, for example, the following quotations from such sources. 
{Italics ours.') 


Word-Fluency // the ability to write and talk easily. People to whom 
words come rapidly and fluently are high irv W. Careers tequirmg 
include actor, stesvardess, reporter, comedian, salesman, writer and 
publicity man. Being high in W helps in drama classes, public speak- 
ing, radio acting, debate, speech, and iournalism.® 

Individuals who score high in this test possess the capacity to under- 


‘Subco^ttee on Guidance Profclenii. ■'Eiieoded or Potential Optimum Guidance 
Prat^crt in Sa^lMcdium, and Large North Central High Schools.” Tht Uorlb 
CtnitJ Qn^rifrlj. October, \SA9, 2S:174'246. 

For Aitt ll~17. Chicago; Scieoce Research Associates, Revised, J 949 , p. 2. 



Criteria of Test Selection 57 

stand and profit from their experience. They should do well in read- 
ing literature and drama. They possess some of the basic abilities 
involved in understanding others and making others understand 
them.* 

Studies of sales personnel, for example, indicate that a successful 
salesman is above average m memory, arithmetic ability, and ability 
to express himself well. Therefore an individual's scores on FACT 
tests 3, 9, and 14 (measuringhisabilityon these three skills) provide 
an estimate of his probable success in sales work.^ 

Those persons with high scores (on the Stenographic Aptitude Test) 
can be advised that they may enter the course with considerable 
likelihood of success. Those with low scores should be counseled 
Against this choice.* 

Students with profiles of this kind [better than 75th percentile 
standing on four factors} possess enough of these abilities to succeed 
in virtually any type of academic endeavor or career provided that 
interest and motivation are likewise suffidently high.* 

High scores on the non-verbai series should indicate likelihood of 
success in jobs calling for visualizing anci for thinking in concrete 
terms. High scores on the verbal series will indicate probable success 
in jobs in which language and ideas expressed in words play a large 
part” 

Another approach is to note the way in which it is implied in 
test manuals that students’ and parents’ questions may be answered 
by use of tests, 'The predictive validity that is assumed in the state- 
ments is frequently unencumbered by evidence. 

« Manual for the California Sbort-Fotm Test of Mental Maturity — Advanced, 
19S1-S Form Grades 9-Adult. Los Aofetes; California Test Bureau, 1951, p. 9- 

t Counselor’s Booklet: Flanagan Aptitude CUsstfcation Tests. Chicago: Science 
Research Associates, 1953, p. 4. 

* Manual for Stenographic Aptitude Test. New York; The Psychological Corpora- 
tion, 1939, p. 1. 

» Manual for the Holzinger-Crowder Umt-Pactor setts. Yonkers, N.Y.: World 
Book Co., 1955, p. 17. .... ... 

10 General Manual for the Lorge-Tbomdike lateUtgtnce Tests. Boston: Houghton 
Mifflin Co., 1954- 



58 


Measurement for Guidance 


QUESTIONS THAT STATEMENTS FROM TEST MANUALS 

STUDENTS AND THAT SUGGEST THE TEST SCORE 

OTHERS ask: mat PROVIDE ANSWERS TO THE 

QUESTIONS, (italics ADDED) 

"Should I take algebra ... it is probable that students ranking 
next year?” below the twenty-fifth percentile will find 

the subject so difficult that they should be 
excluded or diverted into a course in simpli- 
fied mathematics.” 

Could I be a good . . . measures the ability to perceive and 
mechanic?” understand the relationship of physical 

forces and mechanical elements in practical 
situations. This type of aptitude U important 
for a wide variety of jobs and for engineer- 
ing and many trade school courses. The 
person who scores high on this trait [me- 
chanical comprehension] lends to learn 
readily the principles of operation and re- 
pair of complex devices.” 

Is my son good The use of this test reveals to the student 
enough m nuthe- and to the teacher not only proficiency or 
competence b the use of groups of 
^ ■ skills, e.g., skills in the use of fractions, but 

also b the use of specific skills, e.g, divid- 
ing a whole number by a fraction. The lest 
results, therefore, provide a basis for the 
deterimnation of group and bdividual b- 
stmctlonal needs, and the jeleclion and 
founieling of student j relative to enroUment 
m courses end actmlhs reejuitbg basic sMIs 
in aritbmetic.^* 


Aisocuto. 1945, " Acahmcoc. Chicago: Science Research 



Criteria of Test Selection 59 

Could my son sue- Individuals falling into this range {per- 
ceed in an art career? ' centiles 76-100} should, other things being 
equal, find almost certain success in an art 
career. Anyone making a score in the 1-25 
percentiles quarter should take other availa- 
ble tests and inquire into all other factors 
mentioned above before proceeding further. 
If other data corroborate this finding, the 
person would do well to reconsider before 
going further into art. The individual will 
most likely find his abilities better suited for 
other lines of work.‘* 

"Could I succeed in It is a short, single instrument capable of 
mechanical training?" identifying individuals who are good risks 
for training in certain types of mechanical 
activity, and it is also an instrument which 
tests the ability of an individual to apply 
mechanical principles. 

Thus a high school boy who compares 
very favorably with other high school boys 
on this test should be encouraged to consider 
seriously the mechanical field. Should he 
get a comparatively low score when com- 
pared with unselected high school boys, he 
should be encouraged to explore other occu- 
pational outlets. He will not be able to com- 
pete with individuals who have high 
mechanical aptitude if he cannot compete 
satisfactorily with an unselected popula- 
tion.** 

"Have I any aptitude The paragraphs below will tell you briefly 
for clerical work?” about your scores. . . . A high score on this 
test {office vocabulary) shows that you ate 

Manual for the Meier Art Tett. Jowa GV: Bureau of Educational Research and 
Service, State University of Iowa, 1942, pp. 9-12. 

Manual for the Survey of Meth^tal tnsigfa. Los Angeles: California Test 
Bureau, 1935, pp. 2-5. 



40 Measuremeni for Guidance 

QUESTIONS THAT STATEMENTS FROM TEST MANUALS 

STUDENTS AND THAT SUGGEST THE TEST SCORE 

OTHERS ASK: MAY PROVIDE ANSWERS TO THE 

QUESTIONS, (italics ADDED) 

well equipped to learn jobs that involve 
letter writing following and giving direc- 
tions, reading, or talking with people. A 
good vocabulary is cspecialljr important for 
typists, rccepdonlsts, stenographers, and sec- 
retaries. A high score on this test [ofSce 
arithmetic} marks the person who can learn 
computational jobs easily and do them welL 
A high score on this [office checking} 
test suggests that you can master detailed 
clerical work easily.'* 

" 03 uld I succeed in Every achievement test result obtained for a 
algebra?" student during his school career has signih* 

cance not only as a measure of what he has 
accomplbhed in a given course, but also oJ 
a predictor of what he is likely to do in the 
future^ particularly in closely related fields. 
... It is possible, in a given area such as 
mathematics, to determine whether a student 
is consistently strong in the field, or whether 
he tends to manifest greater or lesser profi- 
ciency in Ihb area, as he progresses through 
school.” 

"What are my apti- It is suggested that these Aptitude Tests for 
Occupations be given in conjunction with a 
standardoed interest inventory. [One pub- 
U^ied by the same company is recom- 
mended.} In tills way a counselor will have 


194 "!’'”'''' Scie«c= 


^'Manual for the BIjth Setoad-Year 
Senes. Yonkers, N.Y.: World Book Co, 


Mgthra Test, Evaluation 
1«3. P. 6. 


and Adjustment 



Criteria of Test Selection 6] 

not only an inventory or picture of the indi- 
vidual's occupational aptitudes, but also a 
reliable source of interests. High percentile 
ranks indicate the presence of aptitude: low 
percentile ranks, the lack of aptitude. . . . 
The major difference between an examinee 
whose highest percentile rank is at the 85th 
percentile and another whose highest rank 
is at the 55th percentile lies in the fact that 
the first individual can probably handle a 
higher level of work in that field. For ex- 
ample, if Smith ranks at the 85th percentile 
and Jones at the 55th percentile in the 
General Sales field, it may be assumed Smith 
will be able to do large-scale selling (assum- 
ing equality of personality and other fac- 
tors) while Jones will probably be confined 
to sales work behind a counter.^’ 

"In what area in Similarly, when the tests are used for their 
school or college can principal purposes, the counselor can apply 
I do my best work?" the results in bis work with students to: (a) 
help the student to understand his owp 
strengths and weaknesses in comparison with 
students in certain normal groups; (b) 
guide the student toward choices of educa- 
tional goals and courses most appropriate 
for him; (c) estimate the levels of achieve- 
ment to be expected of the student; (d) 
compare the measured academic abilities of 
students in different classes, grade, and 
school ^oups.” 

Manual for the Aptitude Tetti for Occupations. Los Angeles: California Test 
Bureau, 1951, p. 9. . , 

Manual for the School and College AUhtj Teat. P/iDceton, N J : Educational 
Testing Servire, 1955, p. 3. 



62 


Measurement for Guidance 


PROFESSIONAL ACCEPtANCE AND ENCOURACEMGNT 
OF USE OF TESTS 

The exaggerations and enlhusiastic claims of those who -would 
sell tests are sometimes reinforced by authors whose statements 
unwittingly encourage the uncritical users of tests to interpret 
general findings about validity as if the generalizations applied to 
evety individual who ever took them. Samples of such statements 
follow: (Italics ours.') 

A child’s score on an intelligence lest may be translated into a mental 
age which is an index of his readiness to undertake learning tasks of 
a certain level of di§icuUy.‘^*‘ 

In the interest inventories, the scores in many of the areas or fields 
have not yet been chedeed against adequate aiteria, mainly because 
sudi criteria are difficult to establish. Even so, the studies made i» 
certain interest areas indicate that the scores therein are sufficiently 
valid for guidance purposes.'^'' 

Measurements are useful for supplying facts on which better guidance 
may be based. At the present time, for example, there are well-con- 
strutted, inventories of emotional balance, attitude scales, tests of art 
and music, interest blanks and procedures for measuring lying and 
cheating and stealing.** 

Notwithstanding these limitations the formalized measuring instru* 
ments probably contribute more to our understanding of pupils than 
any other single method. Because of the nature of thdr construction 
according to a rather rigid experimental process, they possess higher 
validity and reliability than other techniques.** 

A great deal of progress has been made during the half century since 

*®T, L. TofgMson and Georgia S. Adams. Mtasutement and Evaluation New 
York. Dryden Press, 1954, p.96, 

*‘J. A. Humphreys and A. E. TraxJer. Guidance Servicer. Chicaco: Science Re- 
search Associates, 1934, p. 156. 

*» A. M Jordan. Measurement in Education. New Yotkt McGraw-Hill Book Co.. 
1953, pp, 4, 12. 

M ^ D s^Uey and D. C. Andrew. Modem Methods and Techniques in Guidance. 
New York: Harper & Bros , 1955, p. 123 . 



Criferia of Test Selection 63 

the first serious attempts were nude to measure intelligence objec- 
tively. Not only have the reliability [consistency] and validity [faith- 
fulness of claim as to what is being tested] of the tests been increased, 
but a much greater variety of tests, for use under widely different 
conditions, now is available to the psj'chologist.** 

It is refreshing, after reading such statements as those quoted 
above, to find Jones saying: 

A personnel officer can, by the use of tests, select the group in which 
most of the good material will be found; but he cannot predict among 
those individuals who will be successful or those who will fail. 
Individual prediction still eludes us, and it probably always will. This 
uncertainty should always be kept in mind by the counselor. The 
argument so often heard: '"These tests are the best instruments for 
predictions we have and therefore we must use them” is invalid; it is 
based on the assumption that the counselor must know just what the 
client's ability is and just what be needs if adequate help is given. 
Such knowledge is impossible; and even if it were possible its use 
would violate the fundamental basis of true guidance.^’ 

Counselors, it has been pointed out, meet two problems in their 
use of tests. There is the common problem of securing tests that 
are generally valid enough so that odds or probabilities can be 
stated for groups. There is the additional problem of trying to 
interpret the scores for the person who, having accepted the gen- 
eral evidence of validity, says, "But what about me?” Counselor’s 
problems with validity differ from those of the tester who seems 
not to regret too much that he will have some "false positives.” 
The fact that a young man may be classified as a neurotic on the 
basis of a test s^ore when there is no other evidence of any kind 
t^t he is neurotic seems to be passed over nonchalantly by those 
who think in terms orSiasses rather than persons.** That another 

2* L. B. Thorpe and W. W. Cruze. DnelapmeMial Piycholozy. New Vorfc; The 
RonaJd Press Co., 1956, p. 306. 

** Arthur J. Jones. Principles of Guidance. New York; McGraw-Hill Book Co., 
1951. p. 188. 

L. J. Cronbach. A ConsidertaioH of Juformasi^n Theory and Utility Theory as 



£4 Measurement for Guidance 

person be rejectedjpr, training toward which he has yqrkedjor 
many years, and for which he may be eminently qualified «ccpt 
for achieving a passing score on a particular test on a particular 
day, does not seem to distress or disturb the authors of tests. They 
seem to be so satisfied with a test that selects potential achievers 
from a group only slightly beuer than docs chance, and at the 
same time misses almost as often as it hits, that they are willing to 
publish and promote the test.” Their statements that tests with 
low validity coefficients are **far from useless” omit the statement 
that "use of such tests may do much harm to many individuals, 
even though it is equally applicable. It is always bteresting to ob- 
serve the behavior of persons who are willing to utilize techniques 
with low validity for use with other people's children but who be- 
come greatly concerned when they arc used on their own progeny. 

The counselor will, of course, appredate that there are times 
' when tests with low validity must be used because there is a job 
to be done hastily with'latgc groups and with no other instiu* 
ments available. During World War II, for example, it was neces- 
sary to try to select the relatively few men who were most likely 
to succeed in pilot training from among the many who aspired to 
such training.** In order to do so a battery of tests was devised, 
administered to the aspirants, and the predictive validity computed 
by comparing the test scores with their later performances in train- 
ing or on the job. The predictive validity of the test battery was 
low but, in view of the situation and the large numbers of aspir- 
ants, the use of ttie test battery seemed to be justified. Counselors 
will not conclude because the use of a battery of tests with low 
validity was justified in times of war and with large numbers in a 
military setting that it is justified in their regular counseling duties. 


Tooh for PsyehometTie Problems. Champaign, Ulinois: Bureau o£ Research and 
Service, University of Illinoii, Novendser. 1953, 65 pp. 

^ychological Corporation. ‘ Better Than dunce.” Ten Service Bulletin, No. 45. 
New York: The Psychological Coiporation, May, 1953, p. 5. 

*• J. C E^igjn. The Selution and Oassificatioo Program for Aviation Cadets.” 
Journal of Comultiag Psyeholagj, Septendiet-October, 1942, pp. 229-239. 



Criteria of Test Selectioa 65 

The authors are well aware that many of the techniques com- 
monly used in counseling may be less valid than the tests that are 
available. At times it seems that it is the questionable validity of 
these techniques that causes counselors to seek tests because they 
seem to be more valid. Such faith seems not to be justified on the 
basis of the evidence of predictive validity contained in test man- 
uals. 


RELIABILITY 

Among the many questions that arise in the interpretation of 
test scores there is a basic one concerning the consistency of a sub- 
ject's performance. Since the items to which a student responds on 
a test are just a selected sample of all the quesdons that could be 
as ked, it is necessary to'deteraune how” co^istent he woulB proba- 
bly^ if he were asked to respond to a similar group of test items 
that sampled tbe's^e'area of measurement. A counselor must 
att^pV thtt, to determine the extent to which a student’s test 
score represents a true indication of his pezfojwance on the factor 
measured by the test. If the score does not represent a true sample 
and a subject's performance varies widely from one testing to an- 
other in the same area, the score will not be useful in counseling. 

The common practice in testing is to present some evidence of 
the consistency of performances of students at one period of test- 
ing or in two testing periods separated by short periods of time. 
Much of the work of the counselor is, however, concerned with the 
problem of consistency of performances of his counselees over long 
ii3r£rMl<s ide wilj want Jn know, for exarqole, whether the student 
who achieved a high score on a mathematics test this year is likely 
to do so some years later. He will often ask, "Ifow certain can I 
be that the performance on the test at the time of testing will be 
cdosistenrwitlranothef perfoHnan^ ara'rnucirIatefTSne?*^The 
answers' to^such" questions-will* be“bf“considerable importance in 
the use of test results in counseling. If the scores obtained by the 



Measurement tor Guidance 

c„„ n , .lnr are not stable he will find them o£ little help in evaluat:_ 
ia^irtrot achievement levels or predicting future performances. 

METHODS OF DETERMINING RELIABILITY 

The split-half method. The split-half method is most typi- 
cally used for determining the reliability of tests. To the producer 
of tests, this approach has several real advantages since it requires 
only a single test administration. It saves the time, trouble, and 
expense connected with methods that require a second administra- 
tion of die test or the designing and administration of an alternate 
form. There are values and limitations to each approach. As a 
consumer of tests, and as one responsible for their selection and 
use with individuals, the counselor s concern will be primarily with 
the evidence of consistency presented rather than with the practical 
consideration of economy that may have inHuenced test authors 
and publishers. 

The coefficient of reliability obtained through the split-half 
method is determined essentially by administering a test to a group 
of individuals, dividing the test into halves (usually by using the 
odd items as one half and the even items as the other) and obtain- 
ing scores for each half. The coefficient of correlation between the 
swes on ffie halves is computedln'd'thejesulf is called the coeST 
cient of reliability. Since the reliability coefficient obtained is for a 
test only half as long as the test that will actually be used, the 
Spearman-Brown formula is employed to estimate what the relia- 
biliy of the full test would be. The formula is presented in most 
texts in general measurement and statistics and need not be re- 
peated here. 

The spht-half method is perhaps most commonly employed be- 
cause of Its economy. Not mfre<iucntly the reliabUity coefficients 
obtamed by this method and ptesenled in test manuals arc impres- 
sively high. The counselor should examine such coefficients in de- 



Criteria of Test Selection 67 

tail before be accepts the evidence of consistency that the split-half 
method purports to provide. He will need to be aware of the con- 
ditions that must be met in a test before it is appropriate to use 
the method and he will want to know what factors may have deter- 
mined the size of the coefficients. He will need to be sure, too, that 
the group of individuals from whom the reliability data were ob- 
tained is similar to the group on which be proposes to use the test. 

Where split-half reliability is presented in a test manual, the 
counselor will look for evidence that the items on each half-test 
were comparable with respect to content, the form in which they 
were presented, the difficulty of the separate items, and their range 
of difficulty. These conditions are assumed in the split-half method 
and the coefficient will be spurious and inappropriate to the degree 
to which they are not satisfied. 

Before accepting and using reliability coefficients obtained by 
the split-half method the counselor will need to keep in mind, for 
example, that some conditions and Uctors that operate to cause 
variation in an individual's performance in the real day-to-day situ- 
ation may not be reflected in the split-half method.** A person may 
perform variously at a given task at difierent times for such reasons 
as changes in health, fatigue, motivation, emotional strain, atten- 
tion, or accuracy. Since the split-half reliability coefficient is com- 
puted from the scor« on two halves of a single test, these factors 
will be relatively and unrealistically constant in each half. This 
consideration is of particular significance when the counselor is 
interested in prediction over a period of time that may entail nor- 
mal fluctuations in performance. 

"Ihe counselor must also be aware that the split-half method 
should not be used for speed tests. Most tests do not have items 
that are equal in form, content, and difficulty throughout. Such tests 
as the typical clerical aptitude test do, however, fall into this cate- 

These and ocher sources of variance of perfomuoce oa a particular test are 
treated in detail bf R. L. Thorndike in Persoaael SeJeitJoa. New York. John Wdey 
& Soni, 19-19, P?. 72-78. 



^8 Measurement for Guidance 

gory. Because these tests require only simple comparisons of names, 
series o£ numbers, or letter combinations, very few etiois are likely 
to be made by individuals who take the test. Score differentiation 
is based prUnatily on the number of items completed correctly. 
When a test of this kind is split into two parts and scored repeat- 
edly it is reasonable to expect the half scores to be so similar that 
almost a perfect positive correlation coefficient is produced-** Dif- 
ferences in scores of individuals are due largely to speed of re- 
sponse at the time of testing since the day-to-day fluctuations that 
account for some differentiation are not given an opportum'ty to 
operate. The more appropriate methods of determining reUability 
in such instances arc the testing and retesting at a later time or 
the using of alternAtc forms of the same test. 

Perhaps the most dramatic denoonstratioa of the spuriously high 
coefficients that may result when the split-half method is applied 
is the now frequently cited example presented is the manual of the 
Differential Aptitude Test ** and reproduced below. 


Table 6. Rfliibility Co«£cieats by Grade Obuised tor the Clerical Speed and 
Aeoitacy Test bf Testlog the AJieraate Forma and Split-Hajf Methods 


Grade 

Form Aes-FormB 

Split-Half 

Form A FotmB 

H 

S 

.77 

.990 

.996 

48 


A 

.991 

.989 

50 


9^ 

.996 

.983 

45 


.86 

592 

.993 

50 


.92 

.996 

.969 

4} 


The differences and their signiflcance for counseling an individ- 
ual student will be apparent to the counselor, especially, in the 


, observauoa is reflected io the dedsioa of the auihots of the PiSereatial 

Aptitude Test Clerical Aptitude subtest to ignore wrong answers in scoring that 
OWiual: “In the group of two hundred and forty- 
shidents. sawing lor rights only and soring by the rights minus 
lotmuia tesuited in only four scores which differed at alL and 

..If Z P.rd..losC Crp. 



Criferia of Test Selection 69 

example, at the eighth grade level. In the alternate form method, 
the resulting coefficient would be very questionable for group 
application and wholly unacceptable for individual use. If only the 
split-half coefficient had been reported it would have presented the 
test in a very favorable light. If a counselor was not aware of the 
limitations of the method, he might place wholly unwarranted 
confidence in the scores. All things considered, the counselor will 
do well, where he has a choice, to selert those tests that employ a 
method other than the split-half in determining reliability. The 
fact that the split-half method is most economical for the publisher 
will not be of great consequence to the counselor whose main con- 
cern is with accuracy of measurement of individuals. 

The test-retest method. The test-retest method is used fre- 
quently when Jhexe is only one form of a test. The same instru- 
ment.ii.administeted to the same group on two occasions and a 
coefficient of correlation of the scores on both administrations is 
computedm ie requ irement jhauhe two tests be equ ivalent is sat- 
isfied'Ih the te st-retest method b ec ause th e samz mstrument is used 
in each case. This advantage does not, however, guarantee accurate 
of measurement. The reader will recall, in this regard, one of the 
questions implied in the first part of this section. It may be stated 
as follows, "Since the items to which the student reacted on a test 1 
were just a selected sample of all the questions in the same area / 
that could be asked, how consistent would he be if he were asked 
to react to another group of items covering the same area of | 
measurement.^” 

This is an important question, for in counseling we are very 
much interested in obtaim'ng a good estimate of a student’s per- 
formance in a particular area. Since in the test-retest method the 
sample of the "universe” is being used twice, it is impossible to 
cover more than one sample of a total performance. As a result an 
inaccurate picture may appear, for in actual work or training situ- 
ations the subject will usually be required to perform in areas be- 


70 Measurement for Guidance 

yond those covered in the test sample. Because sampling of items 
is one of the sources of variance in performance, it should be al- 
lowed to operate in the test situation. It cannot do so when the 
same test is repeated. 

The factor of memory in the tcst-retcst method may tend to 
raise estimates of reliabilities and boredom in repetition may lower 
them. These factors operate more strongly when the lapse of time 
between test administrations is brief. 

Equivalent forms method. A third method of determining 
reliability coefficients requires the administration of alternate form s, 
of the same ^st. The method requires that the two forms be 
equivalent in type, number, and level of difficulty of the items. If 
it is reported in a test manual that two forms of the test we_y 
constructed at the same time from the common pool of item s tha t 
was used in a prcliminaty tryout, the counselor can be rel ative ly 
certain that these requirements are.mct. In general, the alternate- 
forms method wiU provide for the counselor the most usable esti- 
mate of reliability of the three major methods. Since the method 
requires that the^tests.be given wit h a ti me la pse, the day-to-d ay., 
fluctuations of behavior of the individual jvill ha ve a_c hancejo 
influence tesrperforniance as would be the case in actual work or 
study situations. Further, because the t»-o tests represent two sam- 
ples of the total area measured, they offer a mote complete check 
on performance. The counselor has at least a partial answer to the 
question about whether t^ student’s performance is represe ntative 
of what he really can do with the types of problems and questions 
presented in the test. ' ' 

I KUDER-RicHARpsoN METHOD. A final method applied in 

the determination of test reliability utilizes analysis of variance pro- 
ceduies to determine the consistency with which subjects respond, 
to the test items..'While several formulas based on this principle 
have beOT derived, the one most widely applied is known as the 
Kuder-Richardson.** It has some of the limitations of the split-half 

« For a detailed discussion ^ this metii^ see Robert L. Thorndike, •'Reliability," 



7J 


Crifena of Test Selection 

method in that it cannot be- applied io. speed tests. Moreover, it 
does not reflect the daj^*to-_d_ay_yarianceJhat_might be found in an, 
mdividual’s perf ormance^ because it was computed, as in the split- 
h^f method, from a single test administration. Like the split-half 
method, then, it fails to afford the desired evidence of consisten(7 
over a period of time. 

FACTORS INFLUENCING RELUBIUTY 

In the brief discussion above, an attempt has been made to 
describe the methods, limitations, and advantages of the methods 
for determining the reliability of tests most commonly employed. 
Some additional conditions to which the methods described are 
sensitive and that will be important in the interpretation and use 
of reliability data are presented below. 

Reliability coefficients are affected by the range of talent rep- 
resented in the group to which the test is administered. In general, 
the wider the range of talent represented within the group whose 
scores are used in determining reliability coefficients the higher 
they will be. If the counselor accepts the coefficient reported in the 
manual he assumes that the group on which he is going to use a 
test can perform over similar ranges as the group on which the 
reported coefficient was obtained. This means, as a minimum, 
that the manual should indicate the number, age, sex, and edu- 
cational levels of the subjects from whose scores the reliability 
coefficients were computed.** While it is seldom done, it would 
further enhance the accuracy of the reliability data if several in- 
dependently obtained reports of test reliability from various geo- 
graphic areas were presented. 

The effect that ra nge of tale nt can hav^n reliability data is 
suggested by Travers: 

in E. F. Lindquist (Ed.), Educational Meojurement. Washington, D.C.: American 
Council on Education, 1951, pp. 5S6-594. 

Technical Recommendations for Psychological Tests and Diagnostic Techniques. 
Psychological Bulletin SupplemsBt, March, 1954, p. 32. 



72 Measurement for Guidance 

A test developed to measure knowledge of the names of tools 
consisted of 120 multiple choice lest items. When this test was 
administered to all the high school seniors in a snull city, it was 
found to have a split-half reliability of .95 and the scores ranged 
from 20 up to 1 1 5. However, when the same test was administered 
to a group of machinists, the scores ranged from 10(5 to 117 and the 
reliability estimated from this group w'as .20. In the case of the ma- 
chinists, differences in scores would be considered almost meaning- 
less, and this is reflected in the low reliability coefficient derived from 
this group.** 

It can be seen from the example that the counselor cannot af- 
ford to be cnticed^ into playing guessing games with a test manual 
and that he carmot assume that a reported coefficient can be ap- 
plied with confidence to any group of subje cts. 

The reliability of a t«t may not be equal at all parts of the 
range that the test is designed to measure. If the test is designed 
to measure a particular skill or performance over Grades 9 
through 12, the reliability coefficient may vary at each grade 
level. An example of differences in the reliability estimates found 
among grades on a new test of clerical aptitude** is presented in 
Table?. 

The reader will note that the reliability coefficients on the verbal 
skills subtest range from .73 for ninth graders to .90 for a com- 
bination group of eleventh and twelfth graders. While the co- 
efficient of .90 for eleventh and twelfth grade students is fairly 
adequate for individual counseling, the .75 for the ninth graders 
might prove to be unsatisfactory for that purpose. It may be noted 
that the coefficients reported for the two groups of ninth graders 
and two groups of eleventh graders are different, and, in the case 
of written directions for ninth graders, the difference is .10. The 

Co” iSrp^T'V^' M4ajuremf„t. New York: The ilacmiJIaa 

r«r. Yooker,. N.Y.: World Book Co, 



Criteria of Test Selection 73 


Table 7. Split-Half Relubility Giefficients 


Measures 

Cxades 

N 

X 

Verbal skills 

9 

50 

.73 


9 

53 

.77 


11 

47 

.83 


11 

48 

.80 


11-12 

320 

.90 

Number skills 

9 

50 

.82 


9 

53 

.87 


11 

47 

.94 


11 

48 

.88 


11-12 

215 

.83 

Written directions 

9 

50 

.77 


9 

53 

.87 


11 

47 

.89 


11 

48 

.88 


11-12 

2i6 

.85 

Learning ability 

9 

50 

.83 


9 

53 

.90 


11 

47 

.93 


11 

48 

.89 


tests seem to be most reliable at the eleventh grade level. The 
manual indicates that, since the coedrcients were obtained by the 
split-half method, they may be slightly overestimated. 

The number of items in a test may affect the size of reliability 
coefficients. In general (though there is a point beyond which this 
may not be true) the longer the test the more reliable it will be. 
This factor will not ordinarily be of much concern to the counselor 
when a test yields a single score. It will be of importance, how- 
ever, when a test is designed to provide subtest scores as compo- 
nents of the overall function of the test. This is the case in some 
“diagnostic” tests. Of importance to the counselor is the fact that 
the reliabilities of these subtests are nearly always low and fre- 
quently too low to be used for differential diagnosis. This is true 
largely because the relatively few items in the subtests provide an 
inadequate sample of the total number that might have been used. 
It is important, then, that subtest reliabilities be reported in addi- 
tion to that of the test as a whole. If only the latter is reported the 




74 Measurement for Guidance 

counselor must not attempt to interpret or use the subtest scores. 
In this regard, it is refreshing and encouraging to find the follow- 
ing note in the manual of a test just recently irttroduced. The user 
of the School and College Ability Tests cannot be reminded too 
emphatically that the scores of individuals on the four subtests 
should NOT be interpreted separately. The part scores and total 
scores for which the interpretive materials provide recording spaces 
and normative data are reliable enough for individual use, but 
separate subtest scores should never be recorded for individuals.’ 

The appearance of such a warning in a test manual is an encour- 
aging sign that we may look for increasing responsibility on the 
part of test publishers. Unfortunately, the example is the exception 
rather than the rule. It is not uncommon to find test manuals en- 
couraging interpretation of reliability estimates far beyond those 
permitted by the data. 

Such factors in the administration of tests as instructions, tim* 
ing, and motivation may influence the reliability of tests. When a 
person accepts reliability coefficients be assumes that the data were 
obtained by uniform administration of the test. He must also as- 
sume that the reported estimate wdl not be obtained if he departs 
from the instructions in the manual. 

It may be obvious to the reader by this time that, with many 
factors operating to influence scores and consequently reliability 
estimates, a perfect icUability coefficient of 1.00 is never obtained 
in educational and psychological testing. It may also be seen that, 
depending upon the degree of error in measurement, the scores 
obtained on students vary so much that "true” scores are never 
obtained. 

A "true’’ score could theoretically be obtained if we gave a 
student an mfinite number of samples of the task to perform atrd 
averaged his scores. Because coefficients are not based on true 
Kores, but rather on those that are obtained at the time of sam- 


fit Te,». Princeton. N.J.t 

t.ocpcaUve lot Divwon, EJuciUwul Teiting Service, I95J, p, ij, ^ ‘ 



75 


Criferia of Test Selection 

gling , reliability coefficients may be depressed^This fact has occa- 
sionally prompted test designers and publishers to correct the 
reliability for "shrinkage due to eriprs” and the counselor may, 
therefore, encounter reliability coefficients that have been "cor- 
rect for attenuation/’ This correction means that the reliability 
has been computed on the basis of estimated true scores rather than 
obtained scores. Thus, the errors of measurement that are present 
in actual scores are reduced or theoretically eliminated and the 
resulting coefficients are likely to be considerably higher than those 
obtained when actual scores arc used. Thorndike summarized the 
implications of the above very well: " Practical .p rediction m ust_be 
done with existing fallible tests. To some extent it is misleading 
t o prese nt corrected correlations between hypothetical true meas- 
ures. The prediction which could be achieved by a hypothetical, 
perfectly reliable test may be quite misleading because such a 
test is never available to us.” 

Among other things, since day-to-day fluctuations that may afiect 
test performance are present in human performances there se«ns 
little justification to remove them from the test situation. If cor- 
rected coefficients are presented, the uncorrected coefficient should 
also be indicated. 

RELIABILITY OF INDIVIDUAL’S SCORES 

The reader acquainted with statistical procedures may have 
noted by this time that the coefficients of correlation produced by 
use of the methods described above provide a limited amount of 
data regarding a partioriar student's score. The coeStcieot, for all 
practical purposes, indicates the degree to which individuals com- 
prising a group maintain thdr positions from one test administra- 
tion to another but it does not tell us about the consistency of 
individual’s scores. Since counselors make predictions and take 

Robert L. Thorndike, ••Reliability." E. F. Lindquist (Ed.), EJucational Measure- 
mens. Washington, D.C.: American Council on Education, 1951, p. 613- 



76 


Measurement for Guidance 

action on the basis of the scoies earned by one student at a time, 
they will be more concerned with the accuracy of measurement of 
an individual than of groups. If a student obtains a score of 75 on 
a given test (assuming that the performance represented by the 
tKt score has implications for some future performance and the 
implications vary with the magwtude of the score) , the counselor 
will want to know how consistently the student will score at that 
level. He will know that the subject is not likely to be completely 
consistent and his question becomes •' How much w ijl jt-yaiyZl’ If 
the scores achieved by the same individual on repeaWd measur e- 
ments varied widely, he could not know what significance to atta^ 
td^t in tetins of future performance on a job or in training ^ 
later time. This being the case, the counselor should demand some 
evidence about the relative siabUity of the score with whiclTHe is 
working. 

He will find some of the evidence he seeks in the standard error 
of measurement. It is an estimate of the amount by which an Ob' 
tained score is likely to vary from the individual’s true score. The 
SFy, usually presented in terms of raw scores, indicates the range 
betw’een which persons’ scores are likely to fall on retesting. Thus, 
if the reported SE^ for a given test was five raw score points and 
a student obtained a score of 75, one could not be sure what 
would do on a second try at the test. The SE^ would no/ indicate 
what a paiticulai person would do on a retest, 

A table, reproduced in part from the manual ** of a new Clerical 
Aptitudes Test, may suffice to illustrate the use of the standard 
error of measurement. 

The counselor will note in Table 8 that the median standard 
error of measurement obtained from those computed for six differ- 
ent groups (presumably one group for each of Grades 9, 11. and 
11-12 combined, tested, and retested for each measure) on verbal 
skiUs was 2.7. The reader wiU note further, however, that the 2.7 

isssfp""' r»<. Yeeloi, N.Y., Book Co, 



Criteria of Test Selection 


77 


Tas 1£ 8. Rangs and Median Standard Errors ol Measurement 


Measure 

Numbci 
of Groups 

Bange* 

Median 

In 

Percentile 

Terms 

Verbal skUls 

6 

1.S- 3.1 

2.7 

16 

Number skills 

6 

1.8- 2.3 

2.2 

13 

Written directiotjs 

6 

1.7- 2.4 

2.0 

21 

Learning ability 

6 

3jS- 54) 

4.6 

13 

Clerical speed 

6 

5.4- 7.7 

5.8 

16 

General clerical 

6 

8.2-144) 

11.7 

12 

Aptitude 

6 





SE Jifeaa ^cry/l-V based oq groups on which test-retest and sptit-half relisbilitr 
coefllcienta were computed. 

• In raw score terms. 


figure is the median on the verbal skills subtest obtained from 
ninth gtadeis as well as tenth, eleventh, and twelfth graders com- 
bined. While the range of SE^ represented (1.8 to 5.1) may not 
mean a great deal in interpreting the scores in this case, it is ob- 
vious that the standard errors of measurement vary with the differ- 
ent groups and that the median SEy of 2.7 may be somewhat 
misleading if applied equally to all groups. The sigm’ficance of the 
SEjj c£ the example can be seen in the figure "16” under the head- 
ing "In percentile terms.” When it is taken at or near the mean, 
the error (2.7) of the obtained score may be converted to percen- 
tile points on the table of norms provided. From the data we can 
be reasonably sure that the score will place persons on the average 
between, plus and minus 16 percentile points of the percentile indi- 
cated by the obtained score. 

It can be seen that, for a given grade and S£i£» range of 
percentiles in which scores ate likely to fail will vary. It must also 
be noted that the magnitude of the in terms of raw score 
points may not tell the whole story even for a group, for, referring 
again to Table 3, the lowest SE^, 2.0, on written directions, has 
the highest percentile range, and it is from the percentile rank that 
we ordinarily draw the implications. The results of such computa- 



78 Measurement for Guidance 

tions fail, however, to indicate how consistent a particular coun* 
selee will be. 


LONG-TERM STABILITY 

Before closing this discussion of reliability it may be of value 
to consider the additional problen^fjtabilily of scores over.lgng 
periods of time. Much of counseling is concerned with the prob- 
lem of variability injater performances of counselees and, when 
rounselors use lesB, they do so in the hope that the results will_ 
help with this problem. 

The reader will recall from the first chapter that counselors are 
asked such questions as these: Do 1 have what it takes to sneceivl 
■n college? What curriculum should I follow when I enter high 
school next year? What is the likelihood that this student will col 
plete an apprenticeship m the machinist trade? These questions 
b«l'e n range from months in the time 

The need for some long-term evidence of stability of test scores 
has not been entirely ignored in the history of testil T SrreT 
ognized the problem in intellicenc^ 

w henheconi.n ted: some twenty years ago 

if the tests were given with Tl ' ^an it would be 

This simation isrbeTl^^,’ f 

However, in view of the Slat thrit'fo^r/'S'’’ 

.entrance to high school ate ^ 

several years by the school the y recorded and_usei for 

adminisLed rnecetve’v ’‘'= f”- of tests 

importance.- “f considerable practical 

“Arthur E Traxler "R r u i- 

of the ou. itj." 



Criteria of Test Selection 79 

Thorndike anticipated the problem more recently ^vhen he 
stated that "of course, for some purposes we may be interested in 
consistency of performance over an extended period of time, but 
c onsisten cy of this type represents a rather, difficult .concept of re- 
liability.” *° 

He writes further to the point when he suggests that "for use 
in connection with predictions and evaluations extending over 
some period of time, the meaningful procedure appears to be to 
retest with a similar time interval.” 

Referring to this concept as "stability of scores,” the committee 
that prepared the recent Technical Recommendations regards as 
"essential” that "the manual should indicate what degree of sta- 
bility of scores may be expected if a test is repeated after time has 
lapsed, If such evidence is not presented, the absence of informa- 
tion regarding stability should be noted.” ** 

It further observes that "most educational and psychological 
tests measure qualities which are presumed to be stable for some 
time, unless training or specific experience intervene.” 

While it has taken a long time for a recognition of this concept 
to be translated into application, there is a suggestion that, with 
the appearance of new tests, the counselor will have some help- 
ful data. A recent example is found m the excellent manual ac- 
companying the School and College Ability Tests. This manual, 
following closely the Technical Recommendations, contains the 
following statement; "The coefficient of stability, a correlation 
between scores earned by the same stud«its on different forms of 
the test before and after a su bstant ial periodjjf^ime often is a 
useful measure of the stability of scores. Aithough the SCAT senes 
measures abilities that are expected to change uqder good instruc- 

i®R. Thorndike in E. V. Lindquist (Ed.), EJucational hUasuremem. American 
Council on Education. Washington, D.C.: 1951. pp. 571-^17. 

Technical Recommendations for Psychological Tests and Diagnostic Techniques, 
p. 32. 



80 Measurement for Guidance 

tion, stability, characteristics are under study. No stability data 
are available at the time of publication, however.” “ 

It may be encouraging to the counselor to know that some pub- 
lishers are cognizant of the need for such data and that, while not 
commonly available at present, it may be forthcoming. In the ab- 
sence of such data it is, of course, possible for counselors to do a 
little experimentation of their own. - 
The use to which a counselor puts test results will determine 
the need for information on long-term stability of scores. If, for 
example, test results are used to put students into subgroups of a 
class that is already formed, it is unlihely that such long-term data 
will be necessary. If, on the other hand, the counselor expects to 
use the results, as one recent test proposes, "To aid students at 
the eighth or ninth grade levels in Ac selection of appropriate 
high school programs," “ the long-term reliability may be of con- 
siderable importance. 

Vic'S"' T“ f production of tests 

L u'l,’' P«‘0d of time 

■s indeed a challenging task. I, iruy require new approaches to 

likT'toT'ii T'' ‘T"'"' counseling is not 

dtti b“t > f “So'Scanlly in respect to the use of predictive 

mumlity does not guarantee y^rrv 

he that he doesTJtlave roanments, but he must real- 

Ot the limitations of those he doS^ve“Ht“‘l‘?““®i'' 'u 

“• N.Y.: Wo,ld Book Co, ,. 



81 


Criteria of Test Selection 

ing the test results in counseling. In doing so, he will, of course, 
keep in mind that with the best reliability data currently available 
he still may not have what he wants, evid^ce that the test he has 
sel^ted will actually do what it purjwrts.to-do-In Ae.usual.evj- 
de^e of rdiability presented in test manuals the counselor merely 
h_as an indica tion of the consistency with which the test currently 
measures something. Buthe must. look to evidence (discussed pre- 
viously under validity) that the performance represented by the 
score has meaning and implications in terms of the problems' with 
whTch he and his counselees are working. 

■^It seems most unfortunate that the term "reliability” has been 
appropriated from common usage and employed as a technical 
term in measurement. To many persons reliability connotes de- 
pendability, trustworthiness, and generally high value. It should 
b^note<Lthat_a .highly, reliable ( m the tech nical sense) test may 
be useless in cou nseling. The counselor will loolTb^ond 

the elaborate tables of reliability coefficients frequently found in 
the catalogues of test distributors to £nd evidence of high predic- 
tive valiity. If he^finds the latter he need be less concerned with 
t he fo rmer. 


NORMS 

Discussion of the various forms in which normative data appear 
may be found in any elementary textbook on educational measure- 
ment, and brief statements about the merits and limitations of the 
several forms of derived scores appear in Giapter V of this vol- 
ume. It is suggested there that percentiles, despite the fact that 
th^ do not represent equal units on a scale, are probably the most 
useful o.f all derived scores in a counseling process requiring that 
r esults of a-testin g p.rpgmm be interpreted to counselees, their par- 
ents, and school personnel. 

To point out some of the difficulties of test interpretation from 
norms given in test manuals, the case of Joe is given in some detail. 



82 Measurement for Guidance 

He was an eleventh grade boy who was trying to choose between 
a training program for engineering and one for auto mechanics. 
He had achieved the following taw scores on the Differential Apri' 
hide Test battery:*^ 


Table 9. Diffcrenliil Aptitude Store* oC CouR«lce Setting Guidance 


Sub tests 

Joe's Scores 

Verbal Reasoning 

21 

Numeticit Ability 

20 

Abstract Reasoning 

33 

Space Relations 

72 

Mechanical Reasoning 

63 

Clerical Speed and Accuracy 

32 

Language Usage: I Spelling 

2 

II Sentences 

4 


A quick glance at the absolute size of each score might lead to 
the drawing of some false conclusions. It would appear from the 
above figures that Joe’s performance on the space relations test is 
better than on the mechanical reasoning test. The numerical abil- 
ity score, which is smaller than the clerical score of 32, suggests 
that he performed more effectively on the latter test. These raw 
scores are not meaningful until his scores are compared with the 
2,700 eleventh grade boys whose scores are reported in the test 
manual. Usmg these data for comparisons in terms of percentiles, 
the counselor would see Joe’s standing as follows: 


Table 10. Differential Aptitude Score* a* 

Perreotiles 

Subtest 

Joe's Scores 

Percentiles 

Verbal Reasoning 

Numerical Ability 

21 

20 

33 

72 

63 

35 

Abstract Reasoning 

Space Relations 

50 

55 

Mechanical Reasoning 

SO 

Clerical Speed and Accuracy 

J2 


Language Usage; I Spelling 

2 


11 Sentences 

4 

3 



Crtferia of Tesf Selection 83 

The use of such percentile norms permits the counselor to com- 
pare Joes scores with those of other high school boys of similar 
grade levels, and to make some assumptions about his relative 
strengths and weaknesses on the factors measured by the subtests 
of this particular test battery. Joe's performances might have been 
compared with other groups with known characteristics, such as 
engineering students, engineers, auto m^hanics, or college fresh- 
men. In each case his percentiles would differ from those obtained 
by comparison with high school boys at his own grade level. 

CHARACTERISTICS OF NORM GROUPS 

Before the counselor can use the percentiles presented above, he 
must become thoroughly familiar with the selection and descrip- 
tion of the normative groups used by the authors of the test. He 
would need information about the size of the standardization pop- 
ulation and about the subject’s socioeconomic level, geographic 
location, and educational experience. If he is counseling with the 
boy about some vocational or educational decision, he must know 
about the performances of members of a group who have entered 
the kind of training that Joe is considering and about the test per- 
formances of some successful members of the occupation. He will 
seldom hnd adequate information about such matters in test man- 
uals. 

To help Joe to answer his query about whether his test perform- 
ances are most like engineers or auto mechanics, the most meaning- 
ful norms would be those that permitted comparison of his scores 
with those of similar students who had considered such occupa- 
tions while they were in high school, had subsequently completed 
such training, and had entered the occupations. Such norms would 
provide comparative information at the high school level and at 
several strategic points in training and employment. And those 
data should be based on the continuous testing of performance of 
the same groups before and dating training. Since no sizable body 



64 


Measurement for Gu;dancc 


Table 11. Tfpcs of Norms Reported ta IkliausU of Representative Aptitude Tests 
Test Type of Nona Provided She of Norm Croups 


DiSeiential Aptitude 
Tests 


Test of Mechanical 
Gimpreheasioa 
(Bennett) 


Mioaesota Papez Fotm 
Board (Revised 
1948) 


1. Educationah Grades 

8-12. 

2. Sex. 

1. Educational: Grades 

9-13 for men. 

2. Occupational: applicants 
foi jobs. 

3. Women: college fresh- 
men, trainees and appli. 
cants. 


1. Educational: Grades 

10-13 Ihrougli 3tb pear 
engitteeting. lodude 
some age and sex 
groups. 

2. Occupational: bjr sex, io» 
ciudM shop work appli. 
cants to Time and Mo> 
tion Study engineers. 

3. Geographic: New ^g. 
land Itth and IZth 
gwden, Minoeapolu 9th 
graders, Illinois Insb’tute 
of Technology freshmen, 
etc. 


1. Ranges ftoni 2,100 for 
12th grade boys to 7,400 
for 9th grade girls. 

1. Ranges from 300 l2tb 
graders to 833 9th 
graders. 

2. Ranges from 143 candi- 
dates for engioeering 
jobs to 2,217 applicants 
foe job of m^aaic’s 
helper. 

3. Ranges from 111 college 
freshmen to 1,090 
trainees in an airplane 
factory. 

1. Ranges froas H3 msle 
engineering juniors to 

1,288 12th grade ^jrs, 

2. Ranges from 119 men 
in a leadburner’s coune 
to 994 male prison in- 
mates. 

3. Ranges from 178 9th 
grade girls to 46,943 
9ih grade boys and girls. 


MacQuarrie Test for 
Mecham'cal Ability 


SRA Mechanical 
Aptitudes 


1. Age: 10 years to adult 

2. Sex. 


1. Educational: pfli m iJUj 
by sex. 

2. Ocropational: male 
trainees for mecbaoical 
occupations. 


1. Size of groups oof 
given. 

2. 1,000 males and 1,000 
females. 

1. Ranges from 29s ]Jth 
grade girls to 2.240 9th 
grade boys. 

2. 630 male trainees. 


c£ such norm, exists for an, test toda,. the conns, 
sider several alternatives. 

He might, £01 example, use the lennons assnmp, 


must coa- 
that there 



Criferia of Test Selection 85 

are no important differences in crucial psychological and educa- 
tional factors between high school subjects and groups in tra inin g 
or on the job. In this case he would make comparisons of Joe’s 
performances with those of such groups. Authors of several com- 
monly used tests that are said to provide assistance in making voca- 
tional decisions offer norms for many groups whose characteristics 
are known. Examination of the list of norms provided for such 
tests (on p. 84) suggests that the authors have decided that educa- 
tional, occupational, age, and sex groups are different enough to 
need special norms. 

INTERPRETING NOR^^ DATA IN COUNSELING 

While an array of test scores obtained from all the tests given 
in Table 11 might be far more than a counselor in a public high 
school would have at his disposal, it is interesting to speculate how 
he could use them in counseling with Joe. The emphasis at this 
point is on norms and it will be assumed for the moment that the 
criteria of validity and reliability, as discussed above, have been as 
adequately met as current tests can do so. 

The Differential Aptitude Test scores previously mentioned are 
based on the test performances of 2,700 eleventh grade boys. The 
counselor has been given no information about the tentative or 
actual vocational plans or specific educational histories of mem- 
bers of this group. Without them Joe can only be shown that, in 
comparison with this group of his peers, his performance on the 
mechanical reasoning subtest is equaled or exceeded by only 3 per- 
cent, and his performance on numerical ability is average. If the 
counselor applies standard error of measurement concepts rigor- 
ously, even less can be told of this particular counselee’s per- 
formance except that, in the areas assumed to have meaning for 
potential engineering students, his scores lie in the middle and 
upper ranges of eleventh grade boys. 



Measuremeaf for Guidance 

A bit more interpretation is possible in view of evidence about 


uii uiuic u ^»iuic m view or cviuaice aouur 

subse<juent progress of small numbers of these 2,700 students who 
entered training for the field of engineering or entered engineering 
or allied occupational fields. Bennett** reported that 53 subjects 
who had completed engineering training by 1955 had scored on 
the average at the 87th percentile on the numerical ability section 
of the test; at the 77th percentile on space relations; and at the 
82nd percentile on the mechanical reasoning test while still in high 
school. The average high school percentiles of the 22 men actually 
engaged in engineering in 1955 were 89 on numerical ability, 81 
On space relations, and 86 on mechanical reasoning. A further 
breakdown of the scores of these 22 employed engineers indicated 
that their average range of scores while in high school in 1947 
encompassed numerical ability scores between the 60th and 95th 
percentiles and their space relations scores ranged from the 55th 
to 95th percentUe. Taking this variability into consideration. Joe's 
poorest showing, his 50th percentile on the numerical ability sub- 
test, falls just short of the attainment of this successful group. His 
other scores, assumed to have something to do with chances for 
sucews m training for engineering, compare favorably with the 22 
employed engineers.” Should the counselor now encourage Joe to- 
ward^gineermg by stating that his chances for success are proba- 
b y above average and point up U.e vimilatity of his score to those 

f *= -4er 

Te haTh t and, admit- 

-.nt.,,he™oreea„ti„„3i„Hi:"Z:::^f“t 

P«haps, when he recognizes the difficulties noted above, the 


,, _ — “Witu dUOVC, Ul 

■’-''"■Up." T„, s,„u 

ConnJtdiion of ptf«nnl« of^ a'O'onbor. I9J5, p. a. 

''rr* '■« 'H. ,oo” " ‘“.■p™. -ass on >cd.. 


re™r.u,s m,rt, ‘““'“'V. "j’ss' m >ubd 



Criferia of Test Selection 87 

counselor will attempt to get more test scores. As he begins to do 
so, further problems in norms arise. With the Differential Apti- 
tude subtests he had a clear-cut standardization group whose mem- 
bers were similar to his subject. With some other tests he must 
compare the boy’s performance with diverse and unlike groups 
selected at widely varying times and places, and under many and 
varied circumstances. For example, on the Minnesota Paper Form 
Board, a well-known spatial relations test which is commonly 
used in prediction of success in technical, mechanical, and engi- 
neering training, the counselor has the choice of 13 educational 
and 13 industrial groups for comparison purposes. The size and 
composition of these groups vary from 119 leadburner trainees 
in Delaware to 1,123 ninth and tenth grade superior boys and 
girls, who were clients of the Cleveland Jewish Vocational Service. 

If the counselor compares Joe’s raw score of 45 with the first of 
these groups, the leadburner trainees, he finds that it lies between 
the 80th and 90th percentiles; a comparison with the Qeveland 
boys and girls places him at the 70ih percentile. He may also com- 
pare the boy’s performance with 334 male International Business 
Machines customer engineers at Endicott, New York, and find 
that his score places him at the 35th percentile. Perhaps the most 
appropriate available norms for this particular case would be the 
344 first year engineering students at Northeastern University. 
Joe’s scores place him at the 60th percentile for that group. The 
data were gathered prior to 1941 and it is, therefore, likely that 
the characteristics of first-year engineering students and engineer- 
ing curricula have changed, particularly since the beginning of the 
Korean War and the consequent strong recruiting drive for en- 
gineering students. And since the performance measured by the 
Minnesota Form Board is just one variable that is frequently re- 
garded as necessary for successful training in engineering, it is 
likely that the counselor and the counselee will become discour- 

Manual for the Revised Miaaesota Paper Form Board Test. New York: The 
Psychological Corporation, 1948, pp. 12—15. 



38 Measurement for Ouidance 

aged in their quest for a simple yes or no answer from tests to the 
question: "Do I have enough talent to train as an engineer? 

As the next step, the counselor might attempt to use scores on 
the Science Research Associates Primary Mental Abilities test to 
help Joe to answer his questions. Herehis performance on Number, 
Space, and Reasoning subtests — all subtests that the manual im- 
plies would be helpful for prediction in this area — were at the 
65th, 50th, and SOth percentiles respectively on the age norms 
given. Reference to the manual for this test reveals that the norms 
were based on the performance of 18,000 high school students. 
Unfortunately, this group was not further described or differenti- 
ated by sex, location, achievement, socioeconomic position, or vo- 
cational plans. Research** has shown that there are clear and 
important sex differences in scores on this battery that should be 
taken into consideration. Even with supplementary information 
about normative data gleaned from research literature, the coun- 
selor’s interpretation of scores on this test must be limited. 

Specidc and useful norms relating to the questions about predic- 
tion of success in training can be answered only if norms intended 
to permit comparisons with students in general, students in spe- 
cialized training, successful entrants to the occupation, and later 
performance by members of the occupational group are provided. 
Such data could be produced by longitudinal studies if sizable 
groups were followed dirough their preparation, training, and 
work experience. On the reverse side of the "self-interpreting" 
profile of the Prrniry Mental Abilities test, the counselee may 
read that an engineer needs the "ability" to visualize objects in 
space and the subtest "Space" measures this quality. This kind of 
oversimplification and self-interpreUtion U highly misleadmg and 
dangerous. The counselor who looks for evidence to justify this 
statement will not find any and having done so he should reject 
the test. ' 


Frederick 
Abilities Test." 


Piimu, Menul 

onal and PsjtboUgicd Meaturement, J954, l4:687-<589. 



Criteria of Test Selection 89 

Little will be gained from appraisal of Joe's scores on the 
Sdence Research Associates Mechanical Aptitudes Test where the 
only comparison of his performance that may be made is in terms 
of 650 male trainees, high school graduates, nongraduates, and 
veterans who were in training for mechanical jobs as apprentices. 
No other data about the norm group are given! 

More detailed norms are provided for the Bennett Test of Me- 
chanical Comprehension (Form AA).*® Here the counselor has 
his choice of educational norms that present the performance of 
eleventh grade male high school students, technical high school 
seniors, and engineering school freshmen. Occupational norms are 
offered for candidates for engineering positions, men in the Works 
Progress Administration mechanical <x>urses of depression days, 
trainees in an auplane factory, applicants for jobs of mechanic’s 
helper, and several other industrial groups. Detailed and specific 
descriptive data about these groups and information pertaining to 
their selection are missing. 

Joe's score of 44 may be converted into percentiles for each of 
these norm groups as follows. 

Table 12. Coovenioo of Scores to Percentiles 


Group Compared PerceotiJes 


Eleventh grade male high school students 7S 

Technical high school seniors 6S 

Engineering school freshmea 30 

Candidates for engineering positions 13 

Men in WPA mechanical courses 90 

Trainees in an airplane factory 35 

Applicants for jobs as mechanic’s hdpec 73 


Now Joe and his counselor may become involved in a series of 
mental gyrations as they attempt to choose the most meaningful 
group Or groups with which his performance should be compared. 
One might be tempted to generalize to say that a comparison of 

<» Manual for the Bennett Test of Meebtmit^ Comprehension. New York: The 
Psychological Corporation, 1950. 



90 Measurement for Guidance 

Joe's perfoiraance with that o£ other eleventh grade high school 
boys indicates that one quarter of them equaled or exceeded his 
score. He might also be tempted to say that, since comparison of 
his test performance with that of engineering school freshmen 
shows that nearly three quarters equaled or exceeded his score, 
there would be serious doubts of his “measuring up’’ in engineer- 
ing training. Yet the counselor cannot be sure of this since the two 
groups used for this comparison were selected at different times 
and in other places. The counselor has little knowledge of im- 
portant training, educational or motivational factors in the in- 
dividuals composing these groups, and he lacks information about 
the selection, achievement, or menial level of the members. It may 
be indicated generally that, since the score of 44 approximates 
successively lower percentiles as the comparison is applied to 
groups with inaeasingly more training (i.e,, engineering fresh- 
men, candidates for engineering positions, etc.), the factors of 
training and selection become increasingly important. There are 
insufficient data, however, about the composition of these groups 
to warrant the aaeptance of the generalization. Joe and his coun- 
selor must seek more data to help in making decisions. 

In the manual of the MacQuatrie Tests for Mechanical Ability 
one may find the statement that . . [it] has been used to 
measure the aptitudes of more than 5,000,000 persons. . . .“ This 
impressive number of users must have had difficulty in interpreting 
their scores, since only undefined age and sex norms ate provided 
for the instrument. Some attempt has been made in subsequent 
studies’* reported by the publishers to present "tentative" stand- 
ards of performance for groups of operators of gum-wrapping 
machines, sewing-machine operators, leather workers, and aircraft 
workers. Careful examination of this body of norms fails, how- 
ever, to indicate the numbers used, selective factors involved, 


^f»cQu»fr« Tesb for Mechanical Ability. Summary of In- 
vesi.gauon,. Number 2. Los Angeles: California Test Bureau. 1950. 



91 


Criteria of Test Selection 

geographic location, or any other pertinent information about the 
subjects on which the norms were based. 

LOCAL NORMS 

When test publishers fail to provide adequate norms the coun- 
selor may establish local norms. This task can usually be ac- 
complished over a long period of time by amassing enough cases. 
It is a difficult and expensive process but it may be necessary to 
supplement and give added meaning to the scant normative infor- 
mation furnished by most test publishers. Often, because of pe- 
culiarities of the population with which he is working in relation 
to locale, ethnic origins, educational level, mental abilities, or other 
selective factors, the counselor will find his groups considerably 
above or below so-called national norms. 

Two examples illustrate this point. While one of the authors 
served as guidance director of a midwestern university high school 
he noted that the students, who were a highly selected group from 
superior socioeconomic class homes, scored a year or more above 
published norms on standardized achievement tests. The staff of 
this school prided themselves on this fact. Comparison of these 
students with others by use of Educational Records Bureau norms 
for private schools indicated, however, that these students were 
just at grade level when compared with their peers. In this case 
the development of local norms made realistic comparisons pos- 
sible. 

The teachers in an Indian Service boarding school in the south- 
west were greatly concerned because their charges continually 
tested at one or two years below published national norms on 
standardized achievement tests. Comparison of these same students 
with Indian Service norms showed that they were fully one year 
ahead of other Indian Service school students throughout the 
country. Less spectacular examples might be cited in numerous 
instances. 



92 Measurement for Guidance 

More frequent use of local norms would permit some control 
over the circumstances of testing, the conditions of set, expectation, 
motivation, reward, and other psychologically important factors 
that must be weighed in the interpretation of scores. Norms that 
would differentiate between testees who took the test under rou- 
tine test administration procedures might differ from those that 
were developed only upon highly selected and highly motivated 
testees. The difficulty in making any sort of a meaningful com- 
parison between generalized high school grade norms and those 
of selected groups such as engineering freshmen or candidates for 
engineering positions is a case in point. Members of the former 
group produce a wider range of scores as evidenced by larger 
standard deviations. It must be recognized also that such important 
variables as attitude, effort, attention, or motivation are largely 
unstandardized no matter how many "standardized" time limits 
and directions are used. The effect of these factors on norms 
developed from such highly specialized groups of candidates, 
selected students, or successful trainees is such that the counselor 
Is frequently required to make comparisons of unlike subjects. This 
process is not likely to improve his courueling. 

ADMINISTRATION OF TESTS 

The manner in which tests are administered affects subjects' 
performance and Inffuences the interpretation of their scores. It 
would indeed be helpful if the counselor could be assured that the 
score he had obtained represented the students’ best efforts but 
this cannot always be taken for granted. The counselor must be 
aware of the factors that may have been operating to make the 
test perfortnance less than best and he must attempt to control 
these factors when he is giving a test. 

The status of administration, among factors related to the use 
of psychological tests, has been well described by Traxler: 

There is, however, an equally important area (as constructing scoring, 



Criteria of Test Selection 93 

and the use of objective tests) in the whole process from the build- 
ing of a test to the application of the results in individual guidance 
that is largely neglected. TTiis area is the administration of tests. It 
seems highly inconsistent for the producers and consumers of tests to 
expend much time and energy in order to obtain the best possible 
tests, to score them accurately, and to use the results professionally, 
and at the same time to treat very casually one step In the process, 
which, if incorrectly carried out, can invalidate all the rest of the 
•work.®* 

Writing in the same vein more recently, Traxler restated his 
position in these words: 

In view of their crucial imp>ortance in the whole chain of events from 
the conception of the test to the use of the scores in conferences with 
individuals, it seems highly unfortunate that the giving and scoring 
of tests are frequently treated very casually by both the authors and 
the users of tests. Test specialists have been very dilatory in providing 
research data on the many debatable points relative to test administra- 
tion and scoring, and, in general, test makers have not applied the 
same care and zeal to the writing of directions for administering tests 
that they have applied to item validation and other technical aspects 
of test construction." 

The space and treatment given to the factor of test administra- 
tion as contrasted to other aspects of use of tests in two recent 
parallel publications of major importance in the testing movement 
attest to Traxler’s observations. Both the Technical Recommenda- 
tions for Psychological and Diagnostic Techniques ** and Tech- 

s* Arthur E. Traxler. "Needed Improvement ia the Administration of Objective 
Tests." PoUTtb Yearbook of ike National Council on Measurements Used in Educa- 
tion, Fairmont, West Virginia; The National Council on Measurements Used in 
Education, 1947, p. 3. 

s* Arthur E. Traxler. "Administering and Scoring the Objective Test," in Lind- 
quist, E. F. (Ed.), Educational Measuremem. Washington, D.C.; Amencan Council 
on Education, 1951, p. 329. 

»* Technical Recommendations for Psjcbologictd Tests and Diagnostic Techniques. 
Washington, D.C: American Psychological Association, 1954 



94 Measurement for Guidance 

nical Recommendations £qe Achievement Tests” devote only a 
few short statements to desired standards regarding test adminis- 
tration. They suggest to the reader that good administration of the 
testing program is taken for granted, or that this particular aspect 
of test usage is of relatively little importance. Inspection of many 
test manuals must lead one to conclude that these assumptions are 
similarly held by test designers and publishers. It does seem, how- 
ever, that the implications of test administration are such that it 
warrants considerable attention by test users. 

In one sense, of course, the tendency to gloss over the impor- 
tance of the administration of tests is understandable. Proper ad- 
ministration in itself does not contribute directly to the value of a 
test. Many things can happen if the administration is not good, but 
optimal administrative practices do not enhance the basic value of 
the test. Poor administration can have a lowering effect on the 
validity and reliability of test results, but optimum administration 
in a given situation does not improve these factors above the limits 
imposed by the basic data. 

Details of test administration can be something of a nuisance 
and it is easier to avoid such questions as the following that might 
come to mind. Did the student fully understand what he was to 
do in taking the test? How much of the score was due to guessing? 
Could he do better if more time was allowed? How important is 
the factor of time for this student? How apprehensive was the 
student about taking the test? Did the apprehension influence the 
scores and if so, how much? How well motivated was the student? 
Does the score represent the best he could do? Was the student’s 
physical status normal? How diligent was he in pursuing the test 
items? How much time was consumed by daydreaming, looking 
out the window, or reacting to distractions? 

As Ttaxlei has pointed out, research has not supplied answers 
to many of the questions that might be asked about optimal test 

“ RecommtndalJoni for Achievimeni Tests. WashinMon. D C • Ameri- 
can Educauooal Research Aisoaation, 1955. wasnmgton, U.U. Amen 



Criferj’a of Test Selection 95 

administration procedures and one cannot be certain of the total 
effect of poor administration on individual performance. The 
counselor is compelled to use his judgment in evaluating the influ- 
ence of what he suspects and observes regarding administration. 
More research reveals that since conditions and methods under 
which tests are administered are of little consequence in all cases, 
the counselor will have to take into account the factors that might 
have negative implications for test performance and the interpre- 
tation of individual scores.** Some of the factors related to test 
administration, with their implications, are discussed in the follow- 
ing sections. 

ACQUAINTANCE WITH TEST TO BE ADMINISTERED 

Before a counselor gives a test he must be well acquainted with 
it. This acquaintance may be gained during the process of review- 
ing tests for possible use. Ideally, it would be desirable that the 
counselor take the test and have some practice in giving it. While 
the counselor probably cannot put himself completely in the posi- 
tion of the student who takes the test, he can listen for confusing 
or ambiguous directions and types of items. This process will give 
him some idea of what the student will meet and some of the 
questions he may ask. Even here, of course, the counselor will need 
to keep in mind that some sophistication on his part is assumed 
and that what he understands may not necessarily be understood 
by the student. By giving the test to those who will assist in the 
administration^ the counselor will have an opportunity to become 
acquainted with details that he cannot take for granted. Regardless 
of how sophisticated he may be about testing, he carmot afford to 

For an excellent example of trealment of "Preparation for Testing" see the 
Manual for the School and College Abiltiy Test, Princeton, N.J . Educational 
Testing Service, 1955. In this manual such items as "General Instructions, ’ "Schedul- 
ing Tests," "Seating Arrangements,” “Proctors,” "Arranging Test Material," "Man- 
aging the Materials in the Testing Sessions,” and "Role of the Examiner in the 
Testing Session,” as they relate to this tes^ are discussed in detail. 



96 Measurement for Guidance 

make assumptions about bis competence. Experience in giving one 
test does not assure proper administration of another. Test direc- 
tions may be full of surprises and students may ask questions about 
a test that require good answers. 

While the counselor in a given school will be limited by the 
facilities that arc available, an attempt should be made to admin- 
ister tests under the most ideal physical conditions. Suggestions for 
physical arrangements have been made in most texts on measure- 
ment, and those of Thorndike appear generally consistent with 
others. He suggests the following as characteristic of an ideal 
room for the administration of group tests. 

1 . It is quirt and free from disturbance of other activities. 

2. It is well Ughted and ventilated. 

3. It provides each subject with a comfortable seat and a good writ- 
ing space, preferably a desk or table. 

4. It hu appropriate size and shape and has suffideotly gocxl acous- 
tics, so ^at each person being tested can both see and hear the test 
administrator without difficulty. 

3. It provides space so that test proctors can reach any subject being 
tested, to answer questions and to inspect his woik. 

6 . It providrt enough separation between testces to malfi» cheating 
difficult or impossible.*’ 


In many high schools the school libraries come nearest to meet- 
ing the above requirements because they are equipped with tables. 
Since most current tests require use of separate answer sheets, suf- 
ficient space for them and the test booklet and comfortable arm 
support are desirable. The use of chairs with writing arms does 
not meet this need.** 


The counselor should ^ve some thought to the seating of stu- 
dents. There may be a tendency for certain noisy groups of students 

S^ertioa: Teu and reehniques. 

New York: John Wiley & Sons, 1949 p 262. 

. 1: ^ of Type of Desk oa the Be- 

*ul f Michme-icoted Test*. Sfbool tad Satiesj, Septenbet 26, 1942, 56:227-229. 



Criteria of Test Selection 97 

to sit together if location is entirely optional and they may become 
centers of general distraction. 

One of the assumptions that appears to be made in the adminis- 
tration and interpretation of group tests is that all students will 
have been equally interested and motivated while taking the test. 
The absence of discussion of motivation in test manuals suggests 
that the authors have concluded that high motivation is automat- 
ically obtained in group-testing situations. This is not necessarily 
the Case, Students are frequently asked to take tests without being 
given any idea as to why they are taking them and how the results 
are to be used. They should know the purpose of testing and how 
the results will affect them. It is not improbable that many stu- 
dents have learned to be highly suspicious of tests. 

Ordinarily, motivation for taking tests should stem directly from 
the counseling process. This precludes, of course, the kind of coun- 
seling that starts with the administration of tests. It does assume 
that tests will be used only when, in die process of counseling, it 
is believed that test results will aid the student with his problems, 
decisions, and plans. Perhaps the most desirable arrangements for 
testing would be found when, having reached a point in counseling 
where both the counselor and counselee see the need for further 
data, the student could be tested "on the spot” by the counselor 
or referred immediately to a psychometxist. Such arrangements are 
possible in some clinics, but staff and tune are not usually available 
in most high schools. Until such arrangements are possible, the 
counselor may resort to group testing. He may keep a list of stu- 
dents who are to take tests \mtil he has enough for a group admin- 
istration. 

Since motivation is so important, it would seem that test authors 
and publishers would do more about it. Most test manuals offer 
little in terms of prepared statements to be read to the subjects. 
Some of the examples following are typical of the perfunctory 
manner in which students are given teste. "As soon as booklets and 



98 


Measurement for Guidance 


pencils are distributed, say: 'Fill in the blanks on the cover, but 
do not open the booklets.' Allow about two minutes, say: 'This is 
a test to see what you can do with your hands and eyes. Use the 
pencils provided’ . . The printed directions do not call for 
any statements about the purposes of taking the test or the uses to 
which the results will be put. In the opening paragraph of the 
manual it is stated uner^uivocally that "this battery of seven sub- 
tests provides objective measurement of the aptitudes which un- 
derlie successful performance of a wide variety of jobs of a 
mechanical nature,” If this were true the students should be made 


aware of this fact, and how the results will relate to their plans. 
But since one of the principles of test administration is that pre- 
pared directions must be adhered to completely, it is assumed that 
persons giving the test will not go beyond printed directions. 

The introduction of another test is similarly matter-of-fact: "To 
administer . . . address the pupils as follows: ’We are now going 
to give you some tests that measure your ability to think I will 
pass out the test papers’ . . With the awareness that most 
students might have about the implications of the word think, such 
directions probably make many students apprehensive. Here again, 
as with the example above, the manual offers a list of six purposes 

appelr not 


tesfmaf as those given above. One 

denl”Tea "Try to put the stu- 

fC S s ‘’r*’ ‘"e tests ate belg admin- 

^ rtfdent. so 

best et^s tto::‘ :: ttt'r ' tr "r 

pathculat test ts about; but avoid ovetdoingTb phastsl’lt Mt 

N.Y., W„|d BookCo.° Ml, '""' r'«n. Gum. EM. Yod..,., 



Criteria of Test Selection 99 

to encroach on the time allotted for testing.” This does not, 
however, tell the students why they are taking tests and what 
effect the results will have on their current progress, activities, or 
their futures. Since most test manuals offer little assistance on this 
matter the counselor must resort to his own devices. 

In order to increase motivation for one group of students the 
following statement was distributed to them as they entered the 
testing room and was supplemented with similar oral comments. 


TO PUPILS INCLUDED IN THE GUIDANCE STUDY 


The tests and interviews which you are going to take will help us to 
help you to find out the things that you can do best. Because of the 
competition which exists in the world of work today, it is important 
for you to find what your abilities are in order to develop them to 
your best possible advantage. As a result of all these tests and inter* 
views we hope to be able to advise you about various kinds of work 
and study. It Is also our purpose to aid you in learning more about 
your own strong points and to help you to make the best of your 
opportunities. We hope that you will do your best on the tests which 
are given to you. Remember that they have nothing to do with your 
school marks. You may now ask any questions about the work.®* 

Such orientation or motivational procedures will not, of course, 
guarantee best efforts of all and will not compensate for other 
factors that may cause individuals to do less than their best.®® They 

Manual for Differential Aptitude Tests, Nevr York; The Psychological Corpora* 
tion, 1947, pp. 3-4. 

John W. M. Rotbney. Cuidavee Praeiites and P.eiulss. New York; Harper & 
Bros., 1958. 

Another approach to this problem that appears to have many interesting possi- 
bilities is the use of a recently published pamphlet by Herschel T. Manuel. Taking 
a Test. How to Do Your Ser/. Yonkers, N.Y.: World Book Co , 1956. This pamphlet 
is designed to give students confidence and skill in taking tests, or more specifically, 
to give the students (l) a chance to learn what tests are for. what kinds of tests 
there are, how they are built, how the results are expressed, and what they mean; 
(2) an opportunity to learn good practices in taking tests; (3) actual experience 
with test materials (p. 3). In many ways the use of such a pamphlet to orient stu- 
dents could be justified, especially in terms of trying to equalize the advantage the 
testwise student may have over one who has little Of no experience with tests. 



100 


Measurement for Guidance 

may, however, remove some of the doubts and apprehensions that 
some students have -when they ase taking tests. 

The sequence in which tests are presented may influence testees 
efforts. If more than one test is planned, either in a single session 
or in close sequence, the one that is of most interest to the students 
should be given first. It is recognized, of course, that no test will 
arouse universal interest and the counselor will use his judgment 
after he has considered the nature of his group. The approach used 
in the Wisconsin Ojunseling Study in administering the tests of a 
differential series appeared to work well. Since circumstances re- 
quired group testing, all the students participating in the study 
were grouped on the basis of need as determined by inspection of 
individual records. Those who planned to go to college or other 
kind of training after high school were given the Differential Ap- 
titude verbal reasoning, numerical ability, and language usage 
tests.** Those who planned to enter mechanical, farming, and re- 
lated fields were given the mechanical reasoning, numerical ability, 
and space relations subtests. Finally, those who intended to enter 
clerical and related fields were given the numerical ability, cler- 
ical, and language usage subtests.** In each case the students were 
told why they were grouped as they were. Students who took the 
mechanical sequence were administered the mechanical reasoning 
test first, the training group began wiih the verbal reasoning test, 
and the clerical group took the clerical aptitude subtest at the be- 
ginning of their testing period. 

While it is desirable that the counselor administer all the tests 
to be given to his students so that he may observe reactions to tests 
at first hand, it will not always be possible. If he cannot do so he 
will do well to take exception to the statements found in many 


«*Gwrge X. Bennett, Harold G. Scaslioi^ Alexander G. Wesmaa. Dilferealid 
Apw/arfe Trwr. New Yoric; The Psydiological Corpoiatioo, 1947. 

Qot espediHy lelevanl to ttiis dUcussion, it should be mentioned 
that o^er tests of toe battery were given to students in any group on the basis of 
tS* 2 d«a* interpreted to 



101 


Criteria of Test Selection 

test manuals, which suggest that virtually anyone who can read a 
manual can give tests. While statements of this kind may enhance 
sales by implying that a "specialist” is not needed, they take too 
many things for granted. They may assume, for example, that the 
personality and attitude of the test administrator and the rapport 
he has established with pupils in the general school situation are 
unimportant. Those who view the process of testing as essentially 
mechanical are not likely to inspire students to do their best. Those 
who have certain disciplinary relationships with students may 
not be accepted by all the testees. Those who resort to veiled 
threats and expressions of dire consequences or who may suggest 
that test performance is a life or death matter are not likely to do 
the best job. It is no doubt true that in any group situation there 
must be a certain amount of administrative control; it would seem 
equally true that the rapport in a group-testing situation should 
approach that of a one-to-one counseling situation as much as pos* 
sible. As suggested earlier in this section, the counselor uses tests 
because they attempt to measure something that may have implica- 
tions for a counselee’s future performance. The performance sam- 
pled by the test presumably bears some relationship to future 
activity of interest or concern to the individual. One of the assump- 
tions too frequently made when interpreting test scores is that the 
student knew what he was to do and did it as best be could. This 
may or may not be the case. 

One important principle of testing requires that the directions 
provided in a test manual roust be followed exactly as written. The 
reason for this is obvious, for the printed directions represent a 
part of the conditions and circumstances under which the test was 
standardized, the reliabilities were determined, the validity com- 
puted, and the norms constructed. It follows that if the data on 
these factors as provided in the test manual are to be used, the 
scores must be obtained in the same manner as those upon which 
the data in the manual are based. 



102 


Measurement for Guidance 

In selecting a group test, initially, the counselor should choose 
one that has directions most likely to be understood by the "poor- 
est of his group. He will look for tests that have concise and 
clearly stated directions. He will look for tests for which the direc- 
tions are not made too complex by the need to explain novel and 
involved physical manipulations. The directions should include 
suitable examples of the various types of items found in the test 
and provision should be made for adequate practice on such items. 
The directions of the test should anticipate as many of the stu- 
dents questions as possible (Should we guess? Can we use scratch 
paper? How much time do we have? Can we check answers if we 
finish before time is up? How do we change an answer? etc.) , and 
provide specific instructions for answering them.** 

Many tests do have inadequate or confusing directions and one 
example will suffice to illustrate the point. The following direc- 
tions for one subtest of a mechanical ability test are read to stu- 
dents as they read along in the test booklet. 


Wis is the ptictice page for the LOCATION test. Notice the letters 
n the large square, and the 6ve dots in each of the simU squares 

b fore nl’'- n?' GO, but not 

ate^poX: orrsz' trx ^ ^ 

letter K nn fh f j .. large square, so you will put a 

STOTUn ft M SECONDS) READY, 

OP. In the small square at the left yon should have V K N E K 

In the one at the right yon should ha,e U E M O T rT I ’ rni; 
here for consideration nf error.) 

lOMted to ask questions regardine the When the group was 

P^ulty for omissions. Th. resS, ^ whether thire was a 

ions Not many months later when 

penalty for omissions^^ ^ J«med that 
him. With his question answered he imnmii??^i f^otf'er form administered to 
of D.„a.ons. «ore by 20 points. 

Cahfomu Test Bureau, 1953. p. 8 ^ Ab}l„j. Los Angeles: 



Criteria of Test Selection 103 

Directions such as these would seem so complex as to confuse 
even the test sophisticate. If the counselor must use tests with 
directions as difficult as those in the example, he must be alert to 
the possibilities that students will misunderstand them. 

The counselor, however careful and demanding he may be in 
making test selections, will not find any test with directions that 
will be understood by all his subjects. The range of individual 
differences within a group being tested, the previous experience in 
taking tests, the variables such as reading skills that might relate 
to responding to directions, the "set” about tests that may cause 
an individual to misinterpret test directions make this obvious. The 
fact that most students will be able to follow the directions as 
provided in the manual does not relieve the counselor of being 
alert to the exception. 

The counselor, then, will watch during the administration of a 
test for students who seem puzzled, who look around to see what 
others are doing. He will walk unobtrusively around the test room 
after the students have started to check on the students' approach 
and progress. He will tpiestion scotcs obtained from papers when 
only a few items ate completed and <5uestion those papers where 
the initial items are missed. He will always keep in mind that the 
usual invitation for questions included in directions may not bring 
out those in the minds of the shy students. 

The clinician who administers individual tests can observe the 
behavior of subjects during the testing situation and frequently 
these observations are of more help than the test scores in under- 
standing the individual. Many of the discussions of observation 
during testing emphasize the value of such observations.*® It is 
possible that they may also have value in the interpretation or 
significance of particular scores. 

It is important that test behavior be observed and any behavior 

Ruth Stfaflg. Counseling Tefhsffs w College end Secondary School. New yojk: 
Harper & Bros., 1949, p. 52. See also the discussion on this Donald E. 

Super, Appraising Vocational Frtness. New York: Harper & Bros., 1949, pp. 81-84. 



]04 Measurement for Guidance 

that might influence the test score be noted on the cumulative or 
other lest record. As a further check on test behavior and per- 
formance, it is good practice for the counselor to elicit the student s 
reaction to the test situation when he is interpreting the results to 
the student. While some students any attempt to rationalize a 
poor or mediocre score, others may reveal some of the conditions 
of testing that were not observed— or observable. In general in 
such cases it is better to schedule a retest than to refuse to give the 
student thebeneht of any doubt. 


SCORING OF TESTS 


Speed, accuracy, and economy of time and effort are major 
considerations involved in any discussion of test-scoring proce- 
dures, Publishers' brochures often capitalize on this concern for the 
more “practical” mechanics of test use and promote various "gadg- 
ets," seLf-scQcing devices, “Scoteeze” fotcnats, machme-scoting an- 
swee sheets, transparent plastic templates, and other stteamlined 
mnovations to make scoring seem easy and rapid. The choice of 
a test is often determined primarily by the advertised simplicity 
of scotmg. 

Counselors are interested in any time saved in the scoring of 
tests since every hour saved in this operation makes it possible to 
spend more time in interviewing students. But if the counselor is 
to approach, interviews with confidence that his test scores are ac- 
curate, he must pay close attention to the problems involved in 
scoring. 

In many large school ^sterns special test-scoring staffs are as- 
sembled and trained for the arduous task of test scoring. Other 
schools use professional test-scormg services of an agency such as 
the Educational Records Bureau or take advantage of the test- 
scoring services offered by several test-publishing companies.*® 


«» Arthut E Ttwltr. "Admiitoteriiig anA Scoring the Objective Test,” in E. F. 
Lindquist (Ed.), EducaiioMl JAeasureitemt. Washington, D.C.: American Council 
on Education, 1951, pp. 34-35. 



Criteria of Test Seiecfion 105 

While the cost may be small considering the amount of time and 
money already invested in the testing program, many schools do 
not provide for it in their budgets. 

When teachers are to score the tests, an in-service training pro- 
gram about the objectives of the testing program, best ways to 
interpret test scores, and careful presentation of the seemingly 
simple step-by-step scoring procedures themselves are retjuired.^* 
Attention needs to be given to accuracy and provision should be 
made for every paper to be scored at least twice in order to spot 
clerical errors. 

Some schools that do not have a special test-scoring staff have 
solved their problems by appointing a teacher from each depart- 
ment or instructional level who is responsible for the administra- 
tion and scoring of all standardized tests. Under this plan it may 
be necessary to hire substitutes for these teachers while they are 
performing such duties. The advantage in this procedure is that 
training, practice, and the development of accurate techniques in 
scoring are more apt to come from persons who are given this 
specialized responsibility than when all teachers in the school arc 
required to participate in the scoring program. 

USE OF SEPARATE ANSWER SHEETS 

Many tests provide a separate answer sheet for the students’ 
responses. While this simplified, mechanical aid seems to have 
distinct advantages over the older procedure of marking the correct 
answers on the test booklet, it is not without some disadvantages. 
The first of these is the effect upon the student. When separate 
sheets are used with tests whose norms are based on the booklet 
marking and scoring typical of most tests published more than ten 
years ago, some caution in interpretation is necessary. The extra 

good plan for the operation of a test-scoring unit is given ia Arthur E 
Traxler and Others. Introducuon to Testing end the Use of Test Results in Public 
Schools. New York: Harper & Bros., 1953. See Chapter 6, "How Should Tests Be 
Scored r' especially pp. 37-42. 



^04 Measuremeni ior Guidance 

perceptual and manual skill called for In the manipulations of 
separate answer sheets may bring incorrect inferences about the 
typical performance of some students. This is doubly true with 
highly speeded tests where precious seconds may be lost in the 
handling of the answer sheet. 

The use of separate answer sheets imposes some rather rigid 
Umitatioas on the type of test item that may be used. They usually 
requite a multiple-choice or true-false response in order to fit the 
structural limitations of the sheets. The counselor should be aware 
of this limitation in his inferences about performances of the 
counselee, for few of a student’s real problems appear in neat 
rows of five alternate responses.^* 

One distinct disadvantage of the separate answer sheet, scored 
by an agency, is found in the extra tasks involved if the coun- 
selor wishes to make a study of responses to particular items.” 
On the older type of booklet-marked achievement test it was pos- 
sible to make a diagnostic summary of particular Items missed by 
certain pupils. While there is nothing to prevent a counselor’s re- 
questing an item analysis from a scoring-machine operator, extra 
interest and motivation for child study seem to be called for if he 
is to go beyond the total score and use the test diagnostically. In 
some high schools where separate answer sheets are machine-scored 
and whole classes of students are given multiple aptitude test bat- 
teries, only the profile sheets that contain scores of his counselees 
are handed on to the counselor. He has very little information 
about the specific items that his counselee missed or answered cor- 
rectly. 

Manual scoring of test booklets is common for individual intelli- 
gence tests and for several of the older aptitude and intelligence 
tests. Scoring procedures for the older forms of such tests as the 


Anhor E Traxjer. "Aiiffiinbtenog aoi Scoring the Objective Test," in E F, 
Lindquist {tA.), Educalwnal Measurement. Wtshingwn. D.C; American Couiicji 
on Education, 1951, pp. 384-^88. •-ouncu 

« G«n<l M. iUpsport and Irwin A. Berg. ‘'Response Sets on a Muliinle.Choire 
Test Edueaiional and Psychological Meastiremem, Spting, 1955, 15:56-62 



Criteria of Test Selection 107 

Otis, Minnesota Paper Form Board, and the Minnesota Clerical 
tests are done by use of fan, strip, or cutout stencils. This proce- 
dure, which allows for some diagnostic study of the counselee's 
test responses, may still be generally preferred over a separate an- 
swer sheet. 


MECHANICAL CONSIDERATIONS 

Such factors as typography of the test materials, reading ease 
of the test items, and complexity of the actual physical handling 
of the testing materials by the testee must be considered carefully 
by the counselor. Since there is very little research on these matters, 
he is forced to rely on his own judgment about their effects, 

Comparison of the typography of older tests with more recently 
released tests reveals that few changes have been made. The 
amount of white space around test items, size of type, and general 
appearance of such older tests as the National Intelligence Tests, 
published in 1920, compare favorably with tests still commonly 
used. Some departure from standard presentation has been at- 
tempted by several test publishers. The California Tests of Mental 
Maturity are printed with green ink on white paper in an attempt 
to improve ease of reading. The recently published School and 
College Ability Tests of the Educational Testing Service alternate 
red with black ink in an effort to improve the typography of their 
test. Few attempts have been made to evaluate the effects of these 
innovations on the performance of testees. Without such data the 
counselor must be careful in assessing the degree of improvement 
that these typographical variations introduce. 

In some of the older tests the items ate crowded together too 
closely and seem to impose some disadvantages for those students 
with less than normal visual acuity.^* llie older Henmon-Nelson 

Futz Forbes and Witriam Cottle. "A New Method /or Determining Read- 
ability of Standardized Tests." Journal of Applied Pxjcbology, June, 1953, 37:185- 
90; John Pierce-Jones. "The Readability of Certain Standard TesCs." California 
Journal of Educational Research, March, 1954, 3:80-82. 



]Q8 Measurement for Guidance 

Tests of Mental Ability, the earlier editions of the California Tests 
of Mental Maturity, the Van Wagenen Reading Readiness Test, 
or the Myers-Ruch High School Progress Test, all published prior 
to 1940, serve as examples of tests whose typography would seem 
to leave something to be desired. 

Several research studies indicate that some individualized tests 
reijuiie greater reading skill on the part of the testee than he is 
apt to have developed at the time of testing. 

The format of most modem standardized paper and pencil tests 
that use separate or machine answer sheets is straightforward and 
simple. The testee is given an answer sheet, a test booklet, and 
possibly an clectrographic pencil. The directions are given on the 
test booklet for various subtesl tasks. The counselor reads the 
directions to the subjects and may illustrate the correct marking 
procedures on the blackboard. This seems to minimize errors in 
die physical aspects of taking the test. Despite the seeming sim- 
plicity of this task, several kinds of errors may still be introduced. 

If a testee places his test booklet (or the Clerical subtest of the 
Differential Aptitude Tests to the side and his answer sheet directly 
In front of him he is perulized because he uses excessive eye move- 
ments as he goes from one to the other and tries to keep his place 
on the complex answer sheet. The use of a paper straight edge, as 
some testees discover, makes this visual task much simpler. The 
Minnesota Paper Form Board is folded so that a hurried testee 
may miss items 17-48 without being aware that he has done so. 
Care must be taken with the SRA Primary Mental Abilities Tests 
(or any of the Science Research stepped-down test booklets) that 
the answer sheet is aligned accurately with the proper item in the 
test booklet or a whole subtest may be marked incorrectly. 

Only prolonged familiarity with and use of testing materials 
will provide the counselor with an adequate frame of reference 
with which to judge most of the mechanical aspects of tests. He 
must be constantly alert in his use of testing materials to evaluate 



Criterio of Tesf Seleciion 109 

the possible effects of the mechanical make-up of the test itself as 
he attempts to interpret scores to his counselees. 

SUMMARY 

In this chapter several criteria that the counselor must keep in 
mind when selecting tests have been presented. It has been pointed 
out that while many authors of tests make broad claims for useful- 
ness and application in many situations, many of these claims are 
mote implied than demonstrated. It has been suggested that the 
counselor must demand evidence that the tests accomplish what 
is claimed for them. It has been further pointed out that, since 
much of the counselor's work involves predictions of some future 
performance with varying time lapses, it is essential that reliability 
data naust be s upplement ed by some evidence of stability. ^ 

It has been shown that the.separate answer sheet method is not 
clearly supefjor.tg.the hand*-scored answer sheet in saving.time or 
providing useful data for counseling. When,the.answer .sheets ar? 
checked visua lly for ca reless marking, the number of answer sheets 
that can be bandied in an hour by a trained manual scorer approxi- 
mat« th e number that can be processed by a machine-scoring unit. 
Careful analysis of costs, availability, and extra supplies should be 
de^mxn^ bef ore the scoring machine is accepted as a solution 
to the test-scoring probl em. 

It has been suggested that the counselor must look for test data 
that will describe the size of the norm group with which a coun- 
selee is to be compared as well as the ps ychological, socioeconomic, _ 
geo graphic a nd educ ational characteristics o f_the.fitDup.-He will 
also seek data that will permit him to make comparisons of coun- 
selees' scores with those who have entered training of the kind the 
student plans to undertake. The oounselor will find, more often 
than not, that adequate data of this kind are not available in test 
manuals. 



IIQ Measurement for Guidance 

Attention must be paid in selection tests to such factors 
ministration, scoring, m^anical features of the instruments. 
While these are not as vita! as ttiose of validity, reliability, and 
norms, they can seriously affed: the scores obtained. The counselor 
will not be misled in his selection of tests by extravagant claims of 
ease of administration and speed of scoring but will seek those 
testTwhose usefulrress m counseling mdividuals is backed by ade- 
quate evidence. 

^ Discussion Questions and Exercses 

1 . Select several representative tests of mechanical (or other) apti- 
tude and construct a list comparing them on the basis of criteria 
used in validation. To what degree do these criteria afford a logical 
basis for prediction? What other aiteria might have been used in 
the area of testing you selected? How dependable or reliable are 
the criteria used? Assumirrg that a shop foreman may not rate a 
worker's performance in the same way twice, how useful ace such 
ratings likely to be as a validity criterion? Would this apply also 
to teacher's grades? 

2. As a class or group project, review 25 tests of various kinds and 
classify validity data under the headings content, predictive, co»‘ 
current, and construct. For individual counseling which is most 
useful? How many tests offer data on more than one type of 
validity? In terms of the stated purposes of each test, comment on 
the appropriateness of the type of validity data used. 

3. Below is a distriution of the percentiles achieved by 731 students. 
They took the Henmon-Nelson Test of Mental Ability while they 
were in the tenth grade and a year later in the eleventh grade. The 
reliability coefficients (test-retest) reported in the manual fox the 
test were .900 and .887. The tests were administered by members 
of the faculties of the hi^ schools. 

a. What factors might have influenced the variability in test per- 
formance indicated in the table? 

b. Could the coefficient of correlation between the two sets of 



Second administration (eleventh grade) 


Criteria of Test Selection 


in 


Table 13. Fi«t Admuustntioa (Tenth Grade) 


n 

20 


21 

30 


31 

40 


41 

50 


51 

60 


61 

70 


71 SI 91 
80 90 100 


Total 


91-100 

81-90 

71-80 

61-70 

51-60 

41-50 

31-^0 

21-30 

11-20 

0-10 

Total 


3 

1 3 

376 
1 2 11 9 

1 1 II 16 tl 

6 10 10 11 21 

4 9 9 10 8 

17 21 14 12 3 

49 14 8 3 1 

77 57 57 70 65 


2 2 2 24 53 

4 11 12 26 21 

6 17 15 21 11 

14 17 9 9 1 

10 12 11 5 

15 16 S 

14 16 1 

5 3 3 1 

2 I 1 

3 1 

75 95 62 86 87 


83 

77 

74 

66 

61 

79 

89 

52 

71 

79 

731 


scores be considered as a coefficient of test-retest reliability? A 
measure of the stability of test scores? 

c. In what percentage of the cases do you believe that the varia- 
bility of per/ocaunce from one year to the other might in- 
fluence the counseling of students? 

d. Does the variability indicate that the test should be given twice 
before the results can be used with confidence? 

c. In view of the distribution obtained, of what value ate the 
reliability coefficients reported in the test manual? 

f. Of what significance is the fact that subjects at the extremes 
seem to be more consistent than those in the middle ranges? 
Would you attribute it largely to variability of individuals or 
the nature of percentiles? 

g. Using the standard errors of measurement given in the test 
manual and the table of percentiles, compute the possible varia- 
bility of the subjects in the dia^msls who seem to be highly 
consistent. 

4. Select a test for which the standard error of measurement is offered 
in the test manual. Plot the error in points against the scores on 
the percentile table. What range of scores is represented in plus 
and minus one standard error from the score equivalent to the 
fiftieth percentile? What range of stores, and hence percentile 
rank, is noted when two standard errors are plotted. Three? On 



Measuremeni for Guidance 

the basis of these data, what are the implications for the intetpreta- 
tion of test scores? 

5 . The following array of scores was obtained from the administra- 
tion of the Differential Aptitude Verbal Reasoning Test to ninth 
grade boys in public high school: 

11, 1, % 24, 17, 7, i7, 1, 10, 10, 8, li, 8, IS, 8, 8, 9, 8, 18, 

18, 23, 6, 25, 12, 10, 8, 32, l6, 20, 15, 19. 12, 34, 13, 10, 10, 12, 
26, 4, 23, 23. 27, 6, 1, 29, 20, 11, 24, 7, 17, 4, 17, 22, 6, 18, 18, 

3, 2, 6, 29, 25, 20, 4, 28, 21, 4, 15, 4, l4, 16, 11, 3, 0, 5, 15, 20, 
8* 1*4, *12, *23, *31, 24, 33, 6, 27, l4, 11, 18. 22, 21, 3, 11, 5, 15, 
1*1. 5, 12, 6, 17, 22, 24, 17, 22, 21, 15, 10, 1(5, 13, 12. 26, 18, 8, 

4, *40, 14, 16, 19. 23, 13, 21, 13, 7. 8, 30, 22, 11, l4, 6, l6, 17, 
24, 9, 9, 8,5. 

Using procedures suggested in any standardized text in statistics, 
construct an ogive curve for this array of scores, bfake a percentile 
table, comparing the percentile equivalents obtained above and 
from those published in the ounual for the Differential Aptitude 
Test Battery. What differences do you find in the comparison? 
What factors might account for the differences or similarities? 
What are the implications for counseling an Individual student in 
terms of hh status in your school? 

• 6. Dbcuss the relative merits of national and local norms. Under 
' whai clicomstances or in wVvat roanstiing sitoatlons ■would ea ch 
be most appropriate? 

7. Select a standardized psychological test and note the directions to 
be used in giving the test. What arc the possible misinterpretations 
of directions on the part of the student? How would you rewrite 
the directions to take care of possible misinterpretations of the 
task to be done? Under what amditions of use might the direc- 
tions be rewritten? 

8. For the test selated in Exercise 7 above, prepare what you believe 
would be a statement that might be used to motivate the students 
to do their best 

References 


American Psychological Association. Techmc<d Recommendaiom jer 



Criferia of Test Selecfion IT3 

Psychological Tests and Diagnostic Techniques. Supplement to 
Psychological Bulletin, March, 1954, 51. Washington, D*C: 
American Psychological Association, 1954. 

Anastasi, Anne. Psychological Testing. New York: Macmillan, 1954. 
Anastasi, Anne, and Drake, J. “An Empirical Comparison of r<»ftain 
Techniques for Estimating the Reliability of Speeded Tests.” 
Educational and Psychological AUasurement, Aut umn, 1954, 
14:529-540. 

Bittner, R. H., and Wilder, C E. “Expectancy Tables: A Method of 
Interpreting Correlation Coeffidents.” Journal of Experimental 
Education, March, 1946, 14:245-52. 

Brown, Qarence W., and Ghtseili, Edwin E. “Some Generalizations 
Concerning the Validity of Aptitude Tests.” Personnel Psychology, 
Summer, 1953, 6:139-50. 

Cottle, William. “A Form for Evaluating Standardized Tests." Oc- 
cupations. December, 1951, 30:18S-194. 

Cronbach, Lee J. Essentials of Psychological Testing. New York: 
Harper, 1949. 

Cronbach, Lee J. A Consideration of Information Theory and Utility 
Theory as Tools for Psychometric Problems. Champaign, Illinois; 
Bureau of Research and Service, Uoivecsity of Illinois. November, 
1953. 

Cronbach, Lee and Meehl, Paul E. "Construct Validity in Psycho- 
logical Tests." Psychological Bulletin, July, 2955, 52;281-30Z 
Cureton, Edward E. “Validity, Reliability, and Baloney." Educational 
and Psychological Aieasurement, Spring, 1950, 10:94-96. 

Cureton, Edward E. “Validity" in Lindquist E. F. (Ed.), Educa- 
tional Measurement. Washington, D.C.: American Council on 
Education, 1951, pp. 621-694. 

Flanagan, J. C. "Units, Scores and Norms” in Lindquist, E. F. (Ed. ) , 
Educational Measurement. Washington, D.C.: American Council 
on Eduction, 1951, pp. 695— 763. 

Gaylord, R. H., and Stunkel, E. R. "Validity and the Oiterioa." 
Educational and Psychological AUasurement, Summer, 1954, 
14:294-300. 



114 Measuremeni for Guidance 

Jenkins, J. G. "Validity for What?” fonmd of CoMollmg Piy 
chology, March-April, 1946, 10:93-98. 

Johnson, Ralph H., and Bond, Guy. "Reading Ease of Commonly 
Used Tests." Journal of Applied Psychology, October, 1930, 
34:319-324. 

Jones, Aithut J. Principles of Guidance. Third Edition. New York. 
McGraw-Hill, 1951. 

Lennon, Roger. "The Test Manual as a Medium of Communication. 
Proceedings of the 19-53 Invitational Conference on Testing Prob- 
lems. Princeton, N.J.: Educational Testing Service, 1954. 

Lorge, Irving. "The Fundamental Nature of Measurement, in 
Lindquist, E. F. (Ed.J, Educational Measurement. Washington, 
D.C: American Council on Education, 1951, pp- 533-559. 
Manuel, Herschel T. Taking a Test. How to Do Your Best. Yonkers, 
N.Y.: World Book, 1956. 

Patterson, C H. "The Interpretation of the Standard Error of 
Measurement." Journal of Experimental Education, March, 1955, 
23:247-254. 

Pollack, Abraham B. "How to Tell Whether Aptitude Tests ate 
Trustworthy." Business Education World, December, 1949, pp- 
170-172. 

Rcmmcfs, H. H., and Whislcr, L 'Test Reliability as a Function of 
Method of Computation.” Journal of Educational Psychology, 
February, 1938, 29:81-92. 

Rodmey, John W. hi, and Roens, Bert A. Counseling the Individual 
Student. New York: Dryden Press, 1949. 

Selby, P. O. Are Predictive Tests Reliable?" Journal of Business 
Education, October, 1941, pp. 13 - 15 , 

Stuit, D. B. "The Preparation of a Test Manual,” The American 
Psychologist, May, 1951, 6:167-70. 

Super, Donald E. Appraising Vocational Fitness. New York: Harper, 
1949. 

Thorndike, Robert L. Personnel Selection. New York: Wiley, 1949. 
Thorndike. Robert L. "Reliability" in Lindquist, E. F. [Ed.) 'Educa- 
tional Measurement. Washington, D.C.: American Coiincil on 
Education, 1951, pp. 560-620. 



Criteria of Test Selection 115 

Travers, Robert M. W. Educational Measurement. New York: Mac- 
millan, 1955. 

Traxler, Arthur E. "Administering and Scoring the Objective Test.” 
in Lindquist, E. F. (Ed.), Educational Measurement. Washington, 
D.C: American Council on Education, 1951. 

Traxler, Arthur E., and Others. Introduction to Testing and the Use 
of Test Results in Public Schools. New York: Harper, 1953. 



CHAPTER IV 


Test Scores; Etioloqy and Interpretation 


■When a counselor notes test scores similar to those below he 


Table U. Scotes 


Date 

Test 

Foiin 

Scoie 

Derived Store 

in6;58 

3/14/J8 

Jones Test ol Mental Ability 
Iowa Silent Readinj; 

A 

A-i 

IQ too 

(Median Standard Score) 
Bennett Mechaoical 

A 

144 

grade 7-1 

4/20/58 

Comprehension 

A 

4} 

75th percentile 


may feel that he can carry on from there — that he has an indication 
of his counselee’s performance as specified by the title of the tests 
in terms of equality and cpiantity. If so, he may be tempted to make 
such statements as the following: 

"He has an IQ of 100." 

"His reading ability is at the seventh grade level.” 

"His score on a mechanical aptitude test is at the 75th percen- 
tile.” 

These are samples of the kinds of statements commonly made 
by teachers and counselors when tiiey are reporting about pupils 
or counselees. From such stalen^nts it is not uncommon for them 

Mi 


117 


Test Scores: Etiology and Interpretation 

to make deductions about whether the student is working up to 
capacity or even beyond his capacity, to make recommendations 
about choices that must be made, or to predict his performances 
in future educational or vocational activities. They should realize, 
however, that the deductions that can be made, the counsel that 
can be offered, and the actions that can be taken on the basis of 
the test scores are determined by many factors that lie behind and 
beyond them. If the counselor is going to use test scores wisely he 
will find that he must proceed in opposite directions: back into 
the test and forward into the implications of scores. His findings 
on the former will determine how far, if at all, he can extend the 
latter. In this chapter the factors that are inherent in test scores, 
factors which may influence them, and the implications of these 
factors in their interpretation will be examined. 

SELECTION OF A TEST 

The reader will have discovered through his review of test pub- 
lishers’ catalogues, specimen sets of tests, and other sources such 
as the Mental Measurement Yearbooks, that the counselor will 
have many hundreds of tests from which to choose. It is essential 
that the counselor will be guided in his choice of tests by the cri- 
teria presented in this and the previous chapter. Even with the 
most Careful initial selection, however, he will find that he is deal- 
ing with imperfect instruments at best, and that his "best” will 
have its own unique characteristics and properties. He will find 
that each test is different even though it may carry a title similar 
to some other test, and that there is no "standard” interpretation 
that Can be applied even to tests which purport to measure the 
same thing. Each test must be interpreted in light of Its background 
and development, and each test may be interpreted differently 
for different individuals according to the factors or circumstances 
which surround the individual's approach to the test and his back- 
ground. The counselor, then, will ask many questions about a test 



118 Measurement for Guidance 

prior to its selection and, having selected an instmroent, he will 
keep many of the same questions in mind when interpreting the 
results. 

As the counselor contemplates tests-result entries such as the 
above, then, he must ask himself what factors need to be consid- 
ered before such entries become really meaningful. What does 
each entry indicate about the student on whose record it appears? 
What are the implications of the scores and what can they possibly 
suggest about next steps in working with the counselee? The an- 
swers to such questions will require a very thorough investigation 
of the competency of the test authors and publishers, the assump- 
tions they have made in the construction, scoring, and norming of 
the test, the quantity and quality of the evidence they have pre- 
sented concerning the efiBciency of the instrument, the conditions 
under which the test wa^ given, the response of the student to the 
test situation, and the cate with which the scoring and tabulating 
of scores was done. These are all factors influencing a given test 
score, and until the counselor has become convinced as the result 
of his investigations that none of these circumstances are faulty, 
he cannot use the test score with assurance. The inquiries that be 
should make and the kinds of information he is likely to find are 
described in the following pages. 

Before proceeding with a discussion of the factors that may in- 
fluence test scores, it should be pointed out again that scores ob- 
tained from tests with similar names often yield quite different 
results. It is always desirable, theccfotc, to give the specific name 
of the test from which a score was derived whenever it is men- 
tioned. It is always necessary, loo, to remember that no one has 
an IQ or an ability. The test score can indicate only his perform- 
ance on a given test at a given time, and cannot possibly indicate 
possession of an IQ, a percentile, a grade level, or an aptitude. 
Even when we use these cautions and say, for example, that a 
counselee has scored at a certain level on a certain test at a par- 
ticular time, it is possible to make only certain Umited kinds of 



Test Scores: Etiology and Interpretation 119 

interpretations since the siae and significance of the score will have 
been influenced by many factors. 

FACTORS INFLUENCING TEST SCORES 

AUTHORS OF TESTS 

To suggest that the author of a test can influence a test score 
seems obvious, but the point here is not always obvious to the test 
user. While there are many tests with essentially the same title, 
'‘(such-and-such) Test of Mental Ability,” the underlying concepts 
of mental ability and procedures for measuring it may vary greatly. 
They depend upon the training, psychological orientation, and 
concepts of intelligence held by their iutiiots. These may be deter- 
mined by their adherence to a particular school of psychological 
thought, their philosophy, and even their sociological and statis- 
tical orientations. When one buys any author's test, he also buys 
and uses the author's assumptions and concepts. These may be 
crucial, but they may very well be overlooked by the counselor 
when he interprets a score from a "mental ability test.” Definitions 
of terms would not be a factor if we had universal agreement as 
to what mental ability really is, how it is manifested, and bow it 
can be measured, but since this is not the case, the test author is 
very much a part of the test score. 

Counselors may begin the study of a test by examining bio- 
graphical sketches of test authors provided in a number of pro- 
fessional publications * From them, he may learn of the author’s 
tcsiaing, experience, and compefeno? in the held at aicssacertieat. 
In some cases he will find biographical sketches of test authors in 
the manual which accompanies the test Attempts should be made 

^Jaques Cactell (Ed.), American Men of Science, Vol. III. The Social and Be- 
havioral Sciences. New York: R. R. Bowker, 19)6. 

J. McKeen Cattell, Jaques Cattell, and E. E. Ross (Ed.), headers in Education, 2nd 
ed., Lancaster, Pa.: The Science Press, 1941. 

Directory of the American Pijcbolo^icat Association. Washington, D.C.: The 
American Psychological Association, 1957. 



120 Measurement for Guidance 

to obtain information about the current as well as previous activi- 
ties of the authors to determine whether or not they are carrying 
oa professional rather than commercial activities. The counselor 
should also be aware that an author may have achieved high pres- 
tige for many years because he presented theories and practices 
that seemed to be sound at the time but that later proved to be 
very unsound. 

Study of the author’s record must be supplemented, however, by 
examination of his actual performance in the construction of the 
test under consideration. If the counselor finds, for example, that 
authors of unquestionable repute claim that the best evidence of 
the validity of the test they offer is to be found in its successful 
(undefined) use over a period of years, he roust weigh current 
performance more heavily than previous reputation. If an author 
has been active In psydiological and educational measurement over 
such a long period of time that his name has become almost a 
household word, but is still evasive In the discussion of norms and 
validity in a test manual, the latter performance is the one about 
which the counselor must be concerned. And if an author violates 
in his construction of a test what he proclaims to be essential in 
his books and pamphlets on measurement, the counselor must ques- 
tion the test, the writings, or both. 

Perhaps the counselor who has noted the issues raised above has 
reached the conclusion that the reputation of an author can suggest 
general competence, but that it does not guarantee that each test 
he produces will be useful and dependable. Study of the biograph- 
ical sketch of an author is a first essential step in the process of 
test selection, but it must be followed by intensive study of the 
information provided in the test manual. 

PUBLISHERS OF TESTS 


It IS admittedly a difficult task for a counselor in a public school 
or college to make a judgment about something as intangible as 



Test Scores: Etiology and Interpretation 121 

the reputation of a test-publishing house. In a field of publications 
that has become highly competitive and profitable in the last gen- 
eration (over 75 million tests were used in schools in the past 
year) , it has become increasingly necessary for test users to dis- 
tinguish between sales promotion and careful research in test 
design and improvement. Counselors need some background of 
experience in dealing with the products of the several major test- 
publishing concerns before they can develop some basis of judg- 
ment of the "name” or reputation of a particular publisher. Some 
criteria that may help him in this evaluation have been presented 
in Chapter II. 

Information about publishers of tests is difficult to obtain unless 
the counselor has had long experience in dealing with them. He 
must judge them by examining their catalogues to see if they con- 
tinue to offer for sale tests that are demonstrably obsolete or in- 
adequate, if they restrict the sale of test materials to competent 
persons, and if they provide adequate descriptions of their mate- 
rials. In this connection, it is particularly important to note whether 
the test publisher makes the distinction between validity and relia- 
bility and does not imply that reliability means dependability. 
Some evidence of a publisher’s position may be obtained from 
his willingness to answer, without equivocation, letters request- 
ing supplementary information about his tests or asking for ex- 
planations about test data that are not clear. The quality of 
discussions about tests in the pamphlets that they issue may give 
some clues concerning the publishers' policies. The counselor will 
examine that material to see if they present information and re- 
search results that may be useful regardless of whether it concerns 
or requires the use of only the tests they produce. He will examine 
all the publishers’ material, too, tasee if they propose ready-made 
testing programs that may se^ good to psychometrically naive 
persons but do not take Into consideration local circumstances and 
needs. 

The reputation of the publisher becomes important when the 



^22 Measurement for Guidance 

couQSeloE is considering the final score obtained on the test, par- 
dcularly i£ it suggests questionable ethics and practices. If the 
counselor is inclined to say, "They know more about tests ^an I 
do,” or ”No company can aSoid to market a questionable instru- 
malt,” then he accepts everything about the test with which the 
publisher had anything to do. The fact that not all published tests 
are equally good, theoretically sound, and statistically acceptable, 
but still may find a publisher, suggests that not all publishers set 
for themselves the strict standards that should be met by all. Since 
there is a tremendous market for psychological tests, and since 
their publication in many instances has become a highly competi- 
tive profit-making venture, not all pubUshets have resisted the 
temptations (or even need) to meet the competition by marketing 
tests without sufficient development, standardization, or valida- 
tion at the ultimate expense of the student on whose cumulative 
recotd the score is noted. 

Occasionally a good comEiany may publish a very poor test. The 
counselor must always, therefore, look carefully at »ach test pro- 
duced by any publisher to see If it is likely to provide a meaningful 
score for the particular coumelee with whom be is working. 

The publisher's role in the production of a test may vary, of 
course. In some instances, publishers assume the responsibility for 
all the statistical development of a test, relying on the author only 
for basic test items and some other details. On the other hand, 
some publishers, with some tests, do little more than provide a 
draftiog or design function and act as the marketing agent, 
doing nothing with statistics, development of norms, reliability, 
validity, or subsequent revision of a test. 

TEST TITLES AND ASSUMPTIONS INVOLVED IN 
CONSTRUCTION OF TESTS 


It was indicated in an earlier chapter that the title o£ a test does 
not indicate the kind o£ items it contains. Many tests are caUed 



Test Scores: Etiology and Interpretation 123 

tests of mental ability or mental maturity, for example, but since 
there are many definitions of such terms it will be necessary for 
the counselor to determine which of them is employed by the au- 
thors of tests. This can be done by examining the writings in test 
manuals, in books, and journal articles. The following illustration 
of a method for doing so may be helpful to counselors who are in 
the process of selecting tests. 

If one were to consider use of the Terman-McNemat Test of 
Mental Ability,* for example, he might examine the writings of 
the major author to see if he can get definite statements or clues 
concerning his concept of mental ability. 

Terman was the author of the Stanford-Binet test. Since the 
Terman-McNemar Test of Mental Ability seems to be an attempt 
to put the prindples used in the Stanford-Binet into group test 
form, it is probably safe to assume that the terms "mental abili^" 
and "intelligence" are used interchangeably. Terman, the major 
author, is on record with respect to the definition of intelligence 
(or mental ability) in the following words: 

In the case of intelligence it may truthfully be said that no adequate 
definition can possibly be framed which is not based primarily on the 
symptoms empirically brought to light by the test method. The best 
that can be done in advance of such data is to make tentative assump- 
tions as to the probable nature of intelligence, and then to subject 
those assumptions to tests which will show their correctness or in- 
correctness. New hypothesis can then be framed for further trial, and 
thus gradually we shall be led to a conception of intelligence which 
will be meaningful and in harmony with ascertainable facts.® 

Terman has attempted, as Binet did, to analyze some of the 
mental processes which the tests bring into play. The chief pro- 
cedure is, as noted in the quotation, to base definitions primarily 
on the symptoms empirically brou^t to light by the test method. 

^Terman-McNemar Tejt of Mental Ability, Vonkers, N.Y.: World Book Co., 
1942. 

* Lewis Id. Texmin and Maud A. Merrill. MeaiKring Intelligence. Boston: Hough- 
ton Mifflin Co., 19J7, p. 4. 



124 


Measurement for Guidance 

The method here consists of "obtaming a general knowledge of 
the capacities of a subject by the sinking of shafts at critical 
points.” 

Using the "sinking of shafts" method, Terman described the 
procedure for selecting the items of the original group test of 
mental ability. The items were selected as follows: "The test as it 
now stands is composed of ijucstlons and problems which were 
selected from a much larger number by correlating each separate 
item with a dependable measure of mental maturity. The criterion 
used for this purpose was a composite which included grade loca- 
tion, age, total score on a two-hour mwital test, and ratings of the 
pupils by from two to five teachers on intelligence and quality of 
school wbrk. Try-out of these resulted in the elimination of three 
of the thirteen tests, and in the reduction of the 610 items in the 
remaining tests to 370. All items which failed to diifereotiate 
pupils of known brightness from known dullness were elimi- 
nated," * 

The revisions of the old form were designed to make the test 
now under consideration provide "more homogeneous material in 
order to have a test mote highly saturated with a common abil- 
ity." • The arithmetic and numerical subtests were eliminated so 
that the "scores of any two individuals ace more neatly compara- 
ble qualitatively; i.c., they lie along the same continuum. This con- 
tinuum may be characterized as general verbal intelligence. This 
particular change has, of course, been prompted by recent develop- 
ments in factor analysis.” 

Some other minor revisions to permit more rapid scoring and 
some test substitutions have been made, but the reasons for the 
latter changes are not given in the manual. It is indicated that the 
correlation of the revised test with the original test is .91 "which 


N.Y?wiL°rs:''9£ 
wlu ’ilk's 



125 


Test Scores: Etiology and Interpretation 

indicates that the new test can be considered to be measuring essen- 
tially the same basic abilities covered by the original forms.” 

At this point the counselor who proposes to use the test must 
stop to consider some of the factors that are involved. The title of 
the test is the Terman-McNemar Test of Mental Ability, but when 
he reads the manual he finds that the test attempts to measure gen- 
eral verbal intelligence. Now he must decide whether he wants to 
use an instrument that excludes arithmetic and numerical subtests. 
Shall he accept the conclusions arrived at by factor analysis, or any 
other method that happens to be currently popular, that a test of 
mental ability may exclude any reference to performance with 
numerical symbols? Will it be clear when the score appears on the 
student's record that this test was limited to verbal materials? Will 
he now have to add tests that do sample spatial and numerical 
materials? 

In the illustration given above, it has been rewarding to go back 
and find rather complete and clear statements about what the au- 
thors of the test were attempting to measure. Whether or not one 
agrees with the authors, one knows what the authors mean by the 
title of the test and the assumptions that they have employed. 

The fact that all tests bearing similar titles are not necessarily 
similar and do not use the same basic assumptions is revealed when 
the statements given above are compared with those found in the 
manual of another test which also has the term "mental ability” 
as part of its title. “A measure of a pupil’s brightness, called an 
Intelligence Quotient (IQ), is sometimes found by dividing the 
pupil’s mental age by his chronological age. A measure of bright- 
ness of a pupil comparable to an intelligence quotient (IQ) ob- 
tained on the Binet Scale may be found by comparing his score in 
the Gamma Test with the norm for his age ... A measure so 
found is not a quotient, but it is called an 'IQ' because it has the 
same significance.” * 

* Manual lor the Otis Quici-Storing AUntal Ability Test, Gamma Form EM, 
Yonkers, N.Y.: World Book Co., 1953, p. 5. 



^26 Measurement for Guidonce 

At this point the counselor may discover that what he has is a 
"deviation” (from the norm) IQ rather than a quotient. He may 
also be led to believe, because the manual suggests that he has 
somethmg "comparable to an intdligence quotient (IQ) obtained 
on the Binet scale," that there is no difference— that he has ob- 
tained in the 30 minutes required for the testing somethmg that 
would have taken an hour or more to get if the individually given 
and highly respected Binet was administered.^ But is this indeed 
what he has? How "comparable” ate the IQ’s so obtained? Is tht 
IQ exactly the same or only an approximation of the Binet? Art 
they interchangeable? Or are they just “comparable” in principle 
or theory? He gets some hint as he reads further; "Gamma IQ’s 
found in this method tend to be somewhat less variable than ordi- 
nary IQ's, that is, they tend to be somewhat nearer to 100. This 
fact should be borne in mind if comparisons are made between 
Gamma IQ’s found above, and ordinary IQ’s found by the division 
method." * 

Now he must ask, how much “less variable"? Does this mean 
be needs to compensate at the extremes — that the low IQ tends to 
be higher than it should be and that the high IQ tends to be lower 
thsui it should be? Does not this fact need to be borne in mind 
whether or not he happens to be comparing IQ’s derived from 
different sources and by different methods? This variability, where 
the IQ is IQO, may not be of consecyience. A score at the extremes 
ma-j be, but the counselor receives no help from the manual in 
terms of how much compensation is needed.® In any event, if the 


• hUnual for lit Otis QkifkSforiag Abilitj Ttit, op. tit., p. 4. 

»^e ^bVuhtrj of this t«t have supplied some of the data missing in the manual 
in ifieir Ttu SertKt Sonbook, No. 11, "A Comparison of the Results of Three 
Intelligence Tests.*' a report of an invejtigat.M by Roger T. Lennon, Psyche Ottell 
Iw also dtscus^d this vanab.J.tjf in relation to earlier forms of the Otis tests in 
^o artjcln: IQTi md the Otis Measure of Brightness," Journal of EJucaiionaJ 



127 


Test Scores: Etiology ond Interpretation 

counselor is going to use the test results, he will need to be aware 
that not all IQ’s are the same just because they might be so desig- 
nated on the cumulative record, and that variability may be impor- 
tant in the case of the particular counselee with whom he is work- 
ing. 

The two illustrations given above have indicated that two tests 
with similar titles may differ in the basic assumptions used by the 
author and therefore in the results produced. 

In his consideration of tests of mental ability the counselor might 
also want to examine a test with a similar title, but with somewhat 
different assumptions involved. He might then turn to a study of 
the California Test of Mental Maturity,^ This test was first intro- 
duced at a time when factorial studies of intelligence and mental 
test score performances were beginning to challenge the more gen- 
erally held contention of ’*g” or a central component of overall 
intelligence. The authors of the California Test of Mental Ma- 
turity acknowledged their theoretical indebtedness to the findings 
of the multiple factor analysts. They also seemed to accept the 
principle of specificity and independence of diff’ering kinds of in- 
tellectual behavior such as memory, spatial relations, reasoning, 
number facility, and verbal ability. In their descriptions of the tests 
the authors recognized the need for differential analysis of scores 
on tests, but showed sharp differences in the combination of "fac- 
tors” that went to make up the Language and Nonlanguage IQ's 
and MA’s that this test purports to measure. 

The need for diagnosis of intraindividual trait differences was 
called to the reader’s attention in 1931 by the writings of one of 
the authors of the California Test of Mental Maturity.^^ He indi- 
cated that the problem of measuring trait differences was becom- 
ing increasingly important. In a revision of this work, he further 
stated: . . to be useful in the diagnosis and solution of learn- 

Manual {or the California Test of Mental Maturity. Los Angeles: The Cali- 
fornia Test Bureau, 1957. 

Ernest W. Tiegs. Tests and Measurements for Teachers. Boston: Houghton 
Mifflin Co., 1931, pp. 44 and 295-296. 



128 Measurement for Ouidonce 

bg difficulties, intelligence test data must reveal the ways in which 
different students learn and the reasons for their failure to learn 
effectively. . . . Research has done much to clarify this situation 
and reveal that such factors as perceptual speed, memory, compre- 
hension, mathematical ability, inductive and deductive reasoning, 
and verbal abilities arc combined in various ways to produce dif- 
ferent as well as identical IQ*S 

The author then acknowledged the factor analysis work of sudi 
men as Hotelling, Kelley, Thurstone, and Guilford in a footnote.’* 
At the same time he stated that the Qlifornia Test of Mental Ma- 
turity was developed to bring out mote or less indepervdent yet 
interrelated abilities, so that a teacher might analyze and evaluate 
each group of factors of mental maturity separately. In the 1947 
edition of the manual a statement wa^ made that the authors 
", . . that the multiple /dtf/of theory of intelligence comes 

nearer to explaining observable phenomena than does the strong 
central-factor theory alone ...” In the 1950 edition of the man- 
ual and subsequent revisions the reader is told that the theoretical 
framework for the development of the CTMM was based upon 
Elizabeth T, Sullivan's "Psychographic Record Blank.” “ Her in- 
strument was designed to analyze Slanford-Binet performance and 
to set up some 14 categories into which response items might be 
grouped. It was from this “conceptual framework” that items were 
developed for what was to be named the California Test of Mental 
Maturity. Further statements were made in the 1951 manual that 
test data from these items were then . . factor analyzed by the 
Thurstone Centroid Method. . . 

The selection of items found in the California Test of Mental 
Maturity was a subject of further comment in a statement by one 


1 ihe Imptovemeni of Learning- 


I* Ernest Tieg*. TesSi and Measutementt i 
Boston: Houjiiton Mifflin Co., 1939, pp. 39, 

I* An ilmort identical rrfeieoce to the same sources 00 factor analysis b made in 
to 1950!^* ^ ^ Calrfornia Test of Mental Maturity manuals, 

t™ B Iv. CJifom. 



Test Scores: Etiology and Interpretation 129 

of the authors who said that its items . . obviously sample a 
wide variety of relationships involving immediate and delayed re- 
call, spatial relations, logical and numerical reasoning, and verbal 
concepts . . 

Careful examination of each of the subjects indicates to the 
counselor that his counselee must do the following tasks to achieve 
scores in the subtests named. 


Table is. TaiLs Set by SuBtests 


Name of Subtest 

Task 

Test 1. Immediate Recall 

Recall the second of a series of words 


after original otai reading in couples. 

Test 2. Delayed Recall 

Select from multiple choice items de> 
tails of a story read orally 30 minutes 
before. 

Test 3. Seosiiig Right and Left 

IdentihcatioD of lefts and rights from 
among twenty pictures of hands, feeL 
ears, gloves, and wings in differing 
positions. 

Test 4. &faaipulatioa of Areas 

Identify abstract spatial patterns similar 
to paper form boards. 

Test 5. Opposites 

Identify odd and opposite pictures among 
pictorial items of similarities and one 
opposite. 

Test 6. Similarities 

fdentify correct and similar pictorial 
items from among several pictures. 

Test 7. Analogies 

Choose among pictorially presented 
analogies. 

Test 8. Inference 

Select from verbal syllogisms with 
multiple-choice options. 

Test 9. Number Secies 

likatiff incorrect auraber from pit- 
tenied serial presentation of numbers. 

Test 10. Numerical Quantity 

Oetermfne correct coins needed to make 
predetemuned amount 

Test 11. Numerical Quantity 

Solve word problems in arithmetic with 
multiple-choice options. 

Test 12. Verbal Concepts 

Identify correct vocabulary items with 
multiple-choice options. 


** Ernest W, Tiegs. "The Proper Use of InteHigeace Tests." EducajJonJ BuUeiig, 
No. l4, Xos Angeles; CaJiforoia Test Bureau, 15N5, X93I. 



' Measurement for Guidance 

In later revisions of the manual of the California Test of Mental 
Maturity the authors state that die tests were designed to measure 
more of the types of intellectual processes than the Stanford-Binet 
■was designed to measure. The evidence they offer to support this 
theoretical contention is a short statement that the total scores of 
the California correlates as as .88 with the Stanford-BineL 
It should be obvious from the illustration given above that the 
basic assumptions made by the authors of a test will determine the 
form of the test. The counselor must examine each set of assump- 
tions very carefully and decide which one i$ most acceptable to 
him and which is most likely to provide results that can be inter- 
preted meaningfully to his counselee, his parents, his current or 
prospective teachers, and potential employers. 

As he considers IQ scores on a student’s record the counselor 
may decide that though the scores axe derived from different as- 
sumptions and processes, be can compensate for the resulting dif- 
ference. Most mental ability tests suggest that scores can be used 
to assist pupils to choose wisely in planning their educational and 
vocational programs, and many counselors may accept such state- 
ments uncritically. If he accepts the results of the tests at their 
face value, he may find himsdf saying of a low score that the 
student just “doesn't have it, and there is not too much use in 
pressing him,” or, “I must try to dissuade this youngster from his 
aspirations," or “GiUege is obviously beyond this student.” Or, of 
the counselee who makes a high test score but achieves low marks 
the counselor might say. Obviously, this student is falling down 
on the job. He can do much better.” The consequences of such 
interpretation can be serious iu tenns of a youth’s future. From the 
notation "IQ, 100" may come a decision that action iu the direc- 
tion of motivating the student to do better would be fruitless, or 
a decision to tiy to get the student to change hU program from 
college preparatory to something -easier.” Or perhaps he may con- 
sider the desirability of recommending that the student be placed 
in a -slow- section of some class where he will not be expected 



Test Scores: Etiology and Interpretation 131 

to show much progress. Before he makes such decisions, however, 
the counselor may want to check his "score" more closely and con- 
tinue consideration of factors that may influence it. 

VALIDITY REPORTS IN TEST MANUALS 

No one single criterion for the evaluation of a psychological 
test is as important as validity. In the final analysis factual data, 
supported by research studies that show how well the given instm- 
ment does what it purports to do, are of much greater importance 
than all the other commonly used criteria combined. In counseling 
with individual students, predictive validity is the type of validity 
data that seems to be most useful. Granting the importance of 
such considerations as reliabiUty, ease of scoring, readability, costs, 
time for administration, adequacy of norms, and care in standard- 
ization, all these come to naught unless specific and exact informa- 
tion about predictive validity is presented in the test manual. It 
should give assurance to the counselor who proposes to use the 
test that the traits, "factors," or skills he thinks he is measuring 
have some relationship to identifiable criteria that have meaning 
for the case in hand. 

If it was well established that a test did what it was supposed 
to do, the points noted above need not concern the counselor. Test 
manuals usually suggest that mental ability tests "can be used 
effectively" for the following purposes: 

1. Classification of pupils according to dieir ability to profit from 
education. 

2. Guidance of individual pupils m making educational and voca- 
tional choices. 

3. Adapting levels of instruction to the ability levels of individuals 
or classes. 

4. Interpreting levels of achievement of individuals or groups. 

3. Providing research informatioa about the mental levels of 
groups or individuals. 



132 Measurement for GuWonce 

6. Obtaining diagnoslic profiles to aid teachers and counselors in 
discovering learning difficulties. 

7. Obtaining clues for educational and vocational guidance from 
separate factor scores. 

8. Predicting success in certain fidds of work. 

9. Providing evidence of mental deterioration. 

Such lists as the above imply that mental ability tests can do 
much to assist a counselor in his work with individual students 
and in his attempts to assist other members of a school faculty in 
reaching common goals. But the counselor cannot accept such state- 
ments unless they are supported by empirical evidence. Unless it 
is available he must resist the templ^on to predict his counselee’s 
chances to deal successfully with global concepts, rely on text- 
books, be successful in recipe construction, or do well in drama^ — 
all specifically mentioned in commonly used test manuals. He must 
inhibit tendencies on his part to spot diagnostic signs of mental 
deterioration, lack of preciseness in relations with others, or diffi- 
culty with music on the basis of performance on a test. He might 
look m vain for even a simple study that shows the relationship 
of scores on tests and subsequent teachers’ marks — the usual cri- 
teria offered as proof of predictive validity for nearly any such test. 

It is difficult to believe that a major test publisher or well-trained 
test builder could market a test that offers absolutely no empirical 
evidence that it does the job its authors claim for it, but unfortu- 
nately no test manual contains evidence that is fully convincing on 
any of its claims. Some test authors admit that the validity of any 
mental test is difficult to establish and, providing no data, indicate 
that evidence of validity must wait for further knowledge about 
mental development. Others present general statements to the ef- 
fect that the test has been used successfully over a long period of 
time and indicate that in some situations students who have gradu- 
ated with honors made scores in the highest ranges of the test. 
Still others suggest that a mental ability test is valid if it indicates 
the probable rate of progress a pupil will make in getting through 



Test Scores: Etiology and Interpretation 133 

school. Many authors beg the (Question and give as chief evidence 
of validity that the test scores oirrelate highly with another that 
bears a similar title. 

The use of a patent-mcdicine-advertising form for presenting 
evidence of validity must come as a rude shock to anyone who has 
been led to believe, as a result of the statistical data presented in 
the section on the construction of the test, that he was dealing with 
a scientific instrument. Because a test has been successfully used 
(no data given to amplify the word “successful”) the counselor is 
supposed to accept the statement that it is good. Does usage always 
assure value.^ Although it is claimed that the test is effective in 
doing the nine things described above, there is only evidence that 
in some situations honor graduates made scores in the highest 
range of the test. How many ate some situations? Does the use of 
some imply that, in others, the conditions did not hold? What 
about progress through school as a validity criterion? What fac- 
tors other than brightness may influence it? G>uld such factors as 
health, frequent change of school and the attendant problems of 
adjustment and variability of curriculum, be important in determin- 
ing progress? ” 

If a counselor were to look for straightforward evidence that 
mental ability tests can do what their authors suggest, he may be 
somewhat upset as he is led through series of confusing ration- 
alizations that ultimately prove to be less than satisfying. He 
must reject many of the tests for his own protection and for the 
protection of his counselees. Consider the difficulties he might 
get into if he were to try to interpret some test scores to a bright 
senior high school student and his parents with whom enough 
rapport has been developed so that they felt free to ask penetrating 
questions. How would he answer them when they asked for evi- 

DiiceeU's suggestioa that "at least 25 percent of the chiJdren who mate slow 
progress in Khool are of normal or superior intelligence’’ mar cause the counselor 
to further question validity evidence in the form of rate of progress in school. 
Donald D. Durrell. "Learning Difficulties Among Children of Normal Intelligence." 
The Elementary School Journal, December, 1954, 55:201—208. 



^34 Measurement for Guidance 

dence about bow well the test did what it was supposed to do? 
His answer that it had been used “successfully" would elicit a fur- 
ther questioning on the deSnition of "successfully," which could 
be answered only by the vague statanent that in some situations 
honor graduates made highest scores. And he would not have the 
slightest evidence about those situations or where they might be 
found. He would have no evidence with which to answer their 
questions about the value of tests in classifying students, or guid- 
ing them in making educational and vocational choices that ate 
consistent with theic educational level. The poker-playing parent 
would probably conclude that he had caught the counselor bluffing. 

The problem of variability of performances in the various tests, 
too, may cause considerable difficulty in interpretation. The selec- 
tion of a test for measurement of "mental ability" can mean much 
when a counselor is deciding what he has and what he can do 
with it.” Some of the variability in yield of tests was indicated ear- 
lier and Travers writes to the same point: "Intelligence quotients 
from diderent group tests may measure somewhat di^erent aspects 
of intelligence . . . variations in the psychological process in- 
volved, and the varying emphasis on numerical, perceptual, and 
verbal materials, may seriously affat the uses to which the scores 
might be put." ” One example may illustrate the point as it applies 
to mental ability tests. If a counselor examines one test he may 
find that he has selected a "verbal test of mental ability” that in- 
cludes, among others, items that requite arithmetic reasoning. If 
the counselor had chosen instead the Terman-McNemat Test of 
Mental Ability, he would also have had a measure of "general 
verbal intelligence," according to the manual for that test, but in 
the latter instance the arithmetic and numerical items were ex- 
pressly omitted. That the choice amewg available tests of mental 
abUity may well influence the score or IQ that is recorded on the 


Co"’P«>W'» of ResuJts of Three Intelligence Tests." 
Co, IStuuftmini, New York: The Macmillan 



135 


Test Scores: Etiology and Interpretation 

cumulative record should now be clear to the counselor. Though 
the designated measure, mental ability, would still be indicated by 
a particular test title, the actual results and implications for coun- 
seling might vary greatly. 

The counselor caimot avoid this a)nfusion and must wonder at 
times what he has when a test score has been obtained. Of course 
he may feel that it might not matter greatly what differences he 
finds between tests if the one he has selected for use does what the 
manual says it does and what he wants it to do. 

In view of the fact that counselors would be well advised to 
avoid the use of tests for which there is no empirical evidence of 
validity presented in the form of expectancy tables, it would not be 
necessary to consider additional facts about them. Since, however, 
he may find satisfactory evidence of validity about a test he should 
then go on to further study of its characteristics. He may continue 
to look at his test to appraise additional factors such as those de- 
scribed below that must be considered in choosing a test and inter- 
preting the scores he obtains from it 

TRYOUT AND NORMATIVE PROCEDURES 

Since a score that the counselor is considering represents a com- 
parison of his counselee’s performance with that of others in simi- 
lar circumstances, he must make a thorough investigation of the 
norms reported in any test manual. Being somewhat acquainted 
with the concept of norms, he may wonder, for instance, whether 
his counselee can be compared to the group represented by the 
norm. He will want to know about the communities and their rep- 
resentativeness. Did the norm group represent a chance and con- 
venient population or a random selection? Are there any sex 
differences? 

Much of the norm data presented in test manuals are scanty and 
in many cases the wording in the section on norms is confusing. 
In the mannal for one test, for example, it is stated that the nonns 



136 


Measurement for Guidance 


for this test were established through a national testing program. 
Approximately 190,000 tests were distributed to 200 communities 
in 37 states and 307 parochial schools in the diocese of a certain 
city. In a footnote, however, it is noted that: . . not all com- 
munities returned their tests in time to be included in the norma- 
tive population. The norms are based on the results from 148 
communities in 33 states where answers were recorded in test book- 
lets." {Italics added.') 

Just why the authors indicate in the main body of the text that 
190,000 tests were distributed to 200 communities in 37 states is 
difficult to understand. What happened to the results from the 307 
parochial schools? Are they incorporated in the normative popula- 
tion, and-does it make any difference to the counselor- whether or 


not they were used? He is not told the states in which the l48 
cooperating communities may be found although it is indicated 
later that there was a "wide geographical distribution." 

Such descriptions of norm groups must leave the counselor with 
many doubts. He will want to know if the participating communi- 
ties were rural or urban or mixed? Were the pupils enrolled in all 
the uwal kinds of curricular programs? What is the relative dis- 
tribution of boys and girls, especially in the upper grade levels 
where there may have been more dropouts? What was the socio- 


economic status of the communities? And what happened to those 
students m the parochial schools who received special attention? 
It IS possible that the test authors did obtain a good cross section 
of high s^ool youth, but the evidence that they did so is lacking. 

ter the meager and inadequate description of the populations 
he authors go on to point out that, in order to facilitate calcula- 
tions. actual computations were based on only a 10 percent ran- 
sample of the group tested. They then indicated that "the 
ihc iZZTn ‘he cooperating communities and 

tested in each commumty should ir«ure a cross section of the school 



Test Scores: EHology ond Interpretation 137 

population in the grades involved in the norms.” (Italics added.') 
They should but d 'ld they? 

Again the counselor must ask about the adequacy of a 10 percent 
sample. Was it enough? Was it truly random, since the method 
of getting randomness is not described? If random numbers tables 
were used, did the procedure really get proper samples of ethnic, 
sex, and socioeconomic groups? Does the taking of one sample by 
random numbers methods really assure randomness? Were the 
risks involved in the procedure of taking a 10 percent random 
sample of an inadequately described population worth taking 
merely to facilitate calculations? What will the counselor say to a 
student or his parents if they question the results and have to be 
told that the counselor himself questions them because, in order 
to facilitate calculations, the test authors used questionable pro- 
cedures? 

Some publishers of tests simply indicate the size of their norm 
groups without presentation of enough evidence of the character- 
istics of the subjects to make meaningful interpretation possible. 
Nowhere in the literature issued by the publisher of one widely 
used test is there a clear-cut description of the original normative 
sample used in standardization nor is there any description of this 
group’s important characteristics, its location, socioeconomic com- 
position, or selection. Counselors working with individuals in vari- 
ous sections of the country and looking at these norms might raise 
questions as to how appropriate they are for their particular coun- 
selees. The counselor might be a little wary of using norm tables, 
too, without some information about sudi factors as these: pres- 
ence or absence of rural or -uiban youth, sdection by type of tiain- 
ing institutions in the normative group; socioeconomic composition 
of the sample; and the achievement level of the groups included 
in the sample. Pertinent information of this sort is too often con- 
spicuous by its absence. Nor is there any explanation as to how 
items, for which no normative data are furnished, suddenly appear 
in diagnostic profiles. 



)38 Mcaiurcmcnf for Guidance 

Until more adequate noiraalive intoimalion is piovidcd by Icst- 
makets the counselor must often tcly upon his hilcrprclalioM of 
the meaning of his counselcc's scoies. He may have to augment 
thU scanty notm data by his own follow-up of students and local 
normative information as long as meaningful norms are lacking. 

RELIADIUTY REPORTS IN TEST IstANUALS 

The authors of many tests indicate that “the tcliability of a test 
is the stability of the measures it yields.” The counselor must take 
particular notice that the word “reliability” is used in its technical 
sense, defined as above, rather than in the usual dictionary mean* 
ing of the word as ’’the state of being reliable,” when the word 
’'reliable” is defined as "trustwoithy.” It is conceivable that a test 
may be highly reliable in the sense that it yields stable measures 
(the length of period of stability is not given) while at the same 
time it may not be trustworthy in counseling for such reasons as 
those given above in the section on validity. When the counselor 
or counselee plans to take some action such as hypotl^csizlng about 
the future or helping to plan and make decisions, he is confronted 
with the question as to how closely current performance on a test 
correlates with later performance. He is rarely interested in mo* 
mentary measurement, but he is concerned about consistency of 
performance over a long period of time. Test authors rarely if ever 
provide such evidence of long-term consistency. They rely rather 
on evidence of consistency in performance at one sitting or on two 
that ate completed within very short intervals. 

One test manual reports split-half coefficients for a population 
of nearly 300 students in Grades 7 to 9. Interform reliability was 
computed for less than 250 students in the same grades, and the 
authors stated that coefficients for other grade ranges (unspeci- 
fied) varied only slightly from the l4-ycar-old age range. *1116 
counselor would note isx this case that other age ranges are not 
specified and that they varied an unspecified amount from the 



139 


Test Scores: Etiology and Interpretation 

l4-year value. Having all these statements, the counselor has three 
coefficients computed for lower age and grade levels with subjects 
about whom he knows next to nothing except that they lived in a 
certain state. He will wish that the authors had given him some 
more complete tables and further information on their subjects. 

The coefficients described above are only vaguely and generally 
helpful to the counselor when a coutiselee asks if he may take the 
test over again because he feels sure that he could do better next 
time. He might, since coefficients cannot be interpreted to most 
counselees and their parents, try to get some help from the report 
on the standard error of roeasuremait. This in a certain test is 
reported as "approximately 3.2 standard score points for the entire 
range covered by the test.” The counselor may be tempted to an- 
swer the counselee's <question by indicating that, in general, on the 
whole, on the average, and other things being equal, if he is some- 
what like seventh to ninth grade pupils in certain cities in a certain 
state it is not likely that a second trial of the test (at some unspec- 
ified time) will yield scores that differ greatly from those that 
the first trial yields. Such necessary vagueness will not be comfort- 
ing to the eager inquiring counsclee. 

It is common practice to use only a small sample of an original 
population in the computation of reliability coefficients. In one 
test, for example, less than I percent of the subjects in the norm 
group were used. No description of the sampling process is given, 
so the counselor might wonder just how the subjects were selected 
from a parent population more than 100 times as large. He wight 
legitimately wish to have information as to their geographic loca- 
tion, type of schooling, sex, sodoeoonomfc, raciai, or other char- 
acteristics, and he might wonder what sort of sampling procedures 
were used in their selection. And although the variability of their 
scores in this test is reported in MA units (S.D. =23 to 32) no 
information is given as to their average scores, their level of per- 
formances, or their educational achievement. In the same test no 
reliability coefficients were offered for the various subtests of the 



]40 Measurement for Guidance 

test, although users were urged to use a diagnostic profile of 
scores. Comparison of scores ot subscores whose reliability is not 
known cannot be a useful procedure in counseling. 

It is common practice, too, to present reliability coefficients for 
several grade levels combined. This type of sampling which allows 
for maximum variability tends to produce a much higher reliability 
coefficient than one that is determined on a single grade level.’* 
In this approach the counselor is left with the untested assumption 
that the reliability estimate for his individual counselee who is in, 
say, the eleventh grade, will be similar to the data reported for a 
much more variable group in Grades 9 to 12. He is left with the 
responsibility of determining the real reliabilities for his own local 
population. 

The good counselor will not be naive in his interpretation of 
the standard errors of measurement that often appear in the relia' 
bility sections of test manuals. In one manual, for example, one 
may find the statemerrt that the standard error of measurement 
was 3 points and that a pupil's score will be in error not more 
than 3 points dfi2/3 percent of the time. This statement may seem 
very good, and the counselor might interpret the information to 
mean that, for his counselee whose IQ score was 100, the chances 
arc two to one that on successive retests.the IQ score would fail 
somewhere between 97 and 103. This rather narrow range would 
seem to suggest that the obtained IQ score was rather stable. He 
must remember, however, that this figure applied only in general 
and may not apply in the case of his particular counselee. He must 
also consider the possibility that there is still one chance in three 
in the general situation that the IQ would go beyond that range. 
He would have to keep in mind, too, that the reported standard 
error of measurement coveted a tangeof five grades. Since it would 
usually be a higher error figure for a single grade, the counselor 
might well wonder about the size of the standard error of measure- 
ment for the single grade in which his counselee happened to be. 

J* Anastasi, op. cii., pp. 115-117. 



Test Scores: Etiology and Interpretation 141 

It is possible that the factor of reliability in some measurement 
may be overemphasized. Where the interpretation of the results 
may have long-range implications, the stability of the score be- 
comes important.*® That would appear to be the case with many 
tests since one of the assumptions underlying them is that ability 
to learn is a fairly constant tjualily. Some authors have placed 
major importance on the assumption of "‘constancy” when they 
write that . . degrees of ‘brightness* are theoretically constant 
for a given child, being a fixed characteristic of the endowment, 
so that the child who is really below normal at one age will be so 
at ail ages, and that an adult who is above normal was so much 
above normal, relatively, at any age of development. . . . Its truth 
is, in fact, the foundation of our hopes in testing intelligence, (or 
if it is not true, in part at least, we cannot prognosticate, and in- 
telligence measurement will be of no great value." 

Written some 40 years ago, this concept is still a very important 
hypothesis underlying many tests. It is a fundamental assumption 
of those who use tests in attempts at prediction. This being the 
case, adequate evidence of the stability of scores is of more than 
passing concern to the counselor.** 

Henry E. Garrett. "A Developmental Theory of Intelligence." American Pj)- 
chohgist, September, 1946, lv372— 78. 

Arthur S. Otis. "A Criticism of the Yerkes-Bridges Point Scale, with Alterna- 
tive Suggestions." Journal of EJucafioaat Psychology, March, 1917, 8:129-150. 

It may be of interest to summarize some research done on this point and re- 
ported by Traxler, (Arthur E, Traxler. "Reliability, Constancy, and Validity of the 
Otis IQ." Journal of Applied Psyfhology, April, 1934, 18 241-251.) Traxler re- 
ported oa the Otis Self-Administeriag Tests of Mental Ability, Higher Examination. 
The reference is not inappropriate, since subsequent forms, the Gamma Em included, 
appear to be built upon this earlier test Traxler noted that ' Unless the IQ secured 
from a group test is bi^ly dependable, matted injustice may be done in the 
classification and grouping of individuals because of the false in/ormatfoa relative 
to mental ability." Commenting further on the signi6cance of this point, Traxler 
stated that ". . . in view of the fart that IQ's found for a class at the time of en- 
trance to high school are frequently recorded and used for several years by the 
school, the correlation between forms of a test administered in successive years is a 
matter of considerable importance." With small groups (N — 85, 100, 75) graduat- 
ing from the University of Chicago High School, Traxler found stability coefScients 
ranging from .647 to 807, the mean of ten coefficients being .725 between forms 
administered a year or two years apart. Regarding the constancy of the IQ, Traxler 
reported that, of 885 changes studied, one fourth were of nine points or more and 



Measurement for Gaidonce 


142 

DIRECnONS FOR. ADMINISTRATION OF TESTS 


Authors oE tests usually present to the persons who ate to ad- 
minister tests admonitions concerning avoidance oE influences that 
might cause tenseness and anxiety, the guarding against intermp- 
dons, the provision of supplies, and the need to become fa:^iai 
with the instructions. It is often suggested that the time limits be 
adhered to within a margin of a feu' seconds, but feu; is frequently 
not defined. There is occasionally a suggestion that time limits 
should not create a feeling of pressure or nervousness on the part 
of the pupil, but those who have administered such tests know that 
close timing of tests is likely to create the tenseness anxiety sup- 
posed to be carefully avoided. And despite the cautions about tim- 
ing, some authors state that it was their intention to make a power 
test rather than a speed test. In some cases it is suggested that in 
above-average groups an entire class will flnish a subtest before the 
time for it has expired. Instead of using this time to let the subjects 
go back to check their work, some directions suggest that the ex- 
aminer should continue with the directions for the next subtesL 
It is possible for one student under such circumstances to influence 
the whole testing procedure since, if be is the one who has not 
floished, the exactunet must wait (up to the limit of the "time — 
within a few seconds”) until all have finished. Some discretion is 
often left to the examiner and when that is percutted an oppor- 
tunity foe error in test admiiustratirm is possible. There is also 
some possibility for error when separate answer sheets are used, 
since after some tests are completed the students are asked to re- 
tab them to blacken their marks and erase any stray ones. 


that one was u high a5 25 poiois. 71>ou^ ootiag iha^ whea compared with con- 
stancy^ data of other group tests, the Oui change was relatively small, he sUteJ 
that . . . atout one Otis IQ in six changes so rapidly that marked inarQ.rari es 
im^l «cur »f one test alone was used for purposes of dassificanon or eroupine in 
Mgh schoo . Even more marked indmdsul IQ xoie varUb.lity was report^ by 
Consutency and Variahility in the Growth of Intelligence from 
of Gextiie Psjcholcgy, tJecember. 1949. 


75:165-196. 



Test Scores: Etiology and Interpretation 143 

Discussions of the effect of such factors as fatigue, health, test- 
wiseness, and distractions were presaited in detail in Qiapter III. 
It will suffice here to remind the reader that such iactors may influ, 
ence scores whenever tests are given, and that some of them may 
be a function of the way the test is administered. They will not be 
evident from the usual entry of test scores or a cumulative record. 

It is possible, of course, that the student was in "top condition,” 
that the "human factors” were all favorable, but that all was not 
well in the administration. The counselor can only speculate as to 
which of many influences upon a test score may have affected his 
counselee’s score if the directions were not followed precisely. If 
he is especially sensitive to such matters as these, however, he may 
be a little disturbed when he reads suggestions in some manuals 
that the tests are self-administering and that it is merely necessary 
to pass out the booklets, allow the pupils time to study the flrst 
page with a minimum of directions, and let them go ahead and 
take the test. 

The casualness of such an approach may appear to the counselor 
to be more than a little perfunctory and he may wonder about the 
outcome of “a minimum of direaions.” There may be definite 
advantages, in the press of other work, to test all the pupils in a 
school in a day, as is proposed in one manual, but it may occur to 
the counselor that the proposed approach, rather than assuring 
"reasonable uniformity,” may actually encourage considerable vari- 
ability unless the teachers left in charge are as well trained as he is. 

As he inspects the directions that the students are to read he 
may wonder further what his students' reaction to the following 
might have been. “This test contains 90 questions. Do the best 
you can, though you are not expected to be able to answer all of 
them. After the examiner tells you to start, you will be given a 
half hour. Answer as many questions right as possible. Do not 
go so fast that you make mistakes. Do not spend too much time on 
any one question. No questions about the test will be answered by 
the examiner after the test begins.” 



144 Measurement for Guidance 

Did the student heed all this advice? Since he was not supposed 
to be able to answer all the questions, did he relax and take it easy? 
Or, since the task looked somewhat staggering for 30 minutes, did 
he tighten up because of his apprehension? And since not all the 
types of items found in the test are demonstrated in the directions, 
can the counselor assume that he understood what he was to do on 
each new one as he came to it? The examiner’s oral directions call 
for asking, "Is there anyone who does not understand how to 
answer the samples?" But was the student one who, in a large group, 
did not want to let others know that he did not understand what he 
was to do? If the answer to some of these queries is positive, the 
counselor may have revealed more factors that can influence test 
scores. 

DIRECTIONS FOR SCORING AND R£CORDJNC OF TEST SCORES 

As the counselor looks at a test result on a cumulative record, 
he may recognize the importance of accuracy in transcribing the 
score and IQ from the test answer sheet. He may be inclined to 
assume that it was done correctly, but errors have been known to 
occur. The implications of the misreading of a number or the inad- 
vertent recording of a score on the wrong cumulative record need 
no further comment. The recording of the score is preceded by the 
computation of the IQ and, in the case of some tests, this is rela- 
tively simple procedure. 

But assuming that the recording and computation of the score 
have been done accurately and checked well, the counselor may 
continue in his reverse chronology of the development of a test 
score and speculate on the possibilities of an error in the initial 
scoring of the answer sheet. Hand scoring can become tedious if 
done for any period of time, and an error may not be recognized in 
the deceivingly innocuous-looking "number" on the cumulative 
record form. 

One of the claimed features of 


many recently developed tests 



Test Scores: Efiofogy and fnferpretation 145 

is the rapid method of stenciJ-scoring. A test is scored by placing 
a single punched stencil over the answer sheet so that only the 
right answers appear. The total score is the number of marks ap- 
pearing through the punched holes. Such an arrangement assumes 
that the students have followed the directions, "Never put more 
than one mark in any row of spaces." This, of course, leaves some- 
thing to be desired, for it is possible that an enterprising student, 
unable to reduce the alternatives to less than two, may check both. 
The right one may show through the punched hole while the 
wrong answer remains covered. Some authors have anticipated this 
possibility by suggesting in the manual that "if in the case of any 
item two marks have been put in the same row of spaces, draw a 
colored line through the row of answer spaces and allow no credit 
for that item." The recognition of the above possibility goes only 
part way in that it suggests that this be done if one happens to 
notice several marks. The directions should be more explicit and 
should require the test users to saeen every answer sheet before 
scoring with the stencil. To make such a positive statement, how- 
ever, would result in an extension of scoring time and a reduction 
in one of the suggested major features of objective tests — quick- 
scoring, Another approach, of course, would be to provide a 
"wrongs” stencil through which all but the right answers would 
appear. This, too, would require another handling of answer sheets 
and again eliminate the quick-scoring feature to some degree. If 
the test administrator were conscientious and were aware of the 
above possibilities, it seems likely that he would sacrifice speed 
for accuracy, though not specihc^ly told to do so. If not, scoring 
errors of this kind might conceivably be represented in the score 
On the student’s record. 

Provisions are made for machine scoring of many tests and it 
is assumed that anyone who attempts machine scoring will be thor- 
oughly acquainted with the technique, which may be reasonable 
enough. The following quotation from one test manual suggests 



Measurement for Guidonce 

to the counselor some o£ the possibilities for scoring error if a 
machine is used. 

grin each answer sheet carefully before it is scored. [This would 
have been a sound cecotomendatioa for hand scoring too.] Where 
more one answer has been marked for an item erase all marks 
for the item. Erase any stray marks made in the answer space, mas- 
m„r4, as even very and light marks arc sometimes sensed by the 
If the pupil has failed to make complete erasures, make a 
clean erasure. If the marks are too light, go over them with one of 
the special lead pencils. Check carefully by hand a certain proportion 
of the answer sheets to msute m axi m u m accuracy.** 

With all these possibilities for error in machine scoring, the 
counselor will not want to assume too much regarding the accuracy 
of his test score unless he has taken all these precautions or is 
confident that they were taken by the person who administered the 
test. 

Going back another step in tbe process and assuming that the 
score has been piopetly txanscnbed and accurately scored in the 
first place, the counselor may -well a>osider the possibilities of tbe 
effect of the human factor in the icoit he is contemplating. Here 
such things as the attitude of the student at the time he took tbe 
test, bis health, comprehension of dir^tions, and other factors can 
conceivably be represented in the *'number” without being evident 
unless noted specifically on the record. 

As the counselor looks again at test nunuals he will consider 
the assumptions and procedures involved in scoring. He will find 
that many test forms axe designed to permit more rapid scoring 
by a perforated key and be must wonder if the gains in speed of 
scoring have resulted in obtaining less valuable scores more rap- 
idly. No clcar<ut m idcncc on this point is available. The counselor 
may wonder, too, regarding the scoring of a test, by what feaU of 
insight a test-maker can decide that each item in the test has exactly 
the same value as another. Studies in the value of weighting items 
il. Os,! QMakSMhi Aidnj T*a, dt.. p. 5. 



Test Scores; Etiology and Interpretation 147 

are so inconclusive that the allotting of equal values to all items 
must still be considered a questionable procedure. If, of course, 
the validity of a test is high there will be no need to question the 
scoring procedure, but since, in many tests, the validity is question- 
able, the scoring procedure must be one of the factors that is 
suspect. 

SUMMARY 

In this chapter the factors inherent in a test score, factors which 
may influence the score, and the implications of the factors for the 
interpretation of scores have been presented. It has been pointed 
out that the counselor needs to consider these factors both in the 
initial selection of a test and, ultimately, in the appraisal of the 
scores he obtains from the tests he has selected. It has been noted 
that each test available to the counselor is different, and that the 
value and usability of the test results will vary accordingly. Jn 
effect, the discussion in this chapter illustrated practical applica- 
tions of the criteria foe the selection of a test presented in the pre- 
vious chapter. The examination processes described in this and the 
following chapter are those that counselors, in cooperation with 
other members of a school staff, might employ in the selection of 
a test for use in counseling. It appears likely that thorough study 
of a test in the manner described here would result in recognition 
of the limitations and values of tests in helping counselees to help 
themselves to make important personal decisions. 

Discussion Questions and Exercises 

1. Select three mental ability tests and examine the manuals for con- 
cepts of mental ability represented. What differences or similarities 
do you find in the concepts? Do the differences in concepts appear 
to be reflected in the types of items used in the respective tests? 
Do the test authors present evidence in support of their particular 
concepts of mental ability.^ 

2. It was suggested in this chapter that the group upon which a test 



148 Measurement for Guidance 

author tried out his test items would have bearing on ultimate test 
scores earned by individual students. Why is this the case? What 
differences would you expect to find in such item trials between 
groups selected from traditional Eastern college preparatory 
schools and groups selected from small, bilingual communities of 
the Southwest? Between rural and urban groups? 

3. The manuals of some tests of menUl ability suggest that the test 
results may be used to assist students in the choice of an occupa- 
tion. What evidence would you widi to have in substantiation of 
such a claim? Do you think occupations can be classified in terms 
of the mental ability required? Why or why not? 

4. Prepare a critical review of a test of mental ability based on the 
pattern presented in this chapter. After completing the review, 
compare your review and analysis with that of reviewers in the 
hUntal Measurement Yearbooks. To what extent arc you in agree- 
ment? Can you defend your stand on points of disagreement? Are 
the reviews in the Yearbook consistent with each other? What 
factors might account for diffetenc« among the Yearbook re* 
views? 

5. It has been said that "the old tests are the best" On what assump- 
tions might such a statement be based? Do you agree with the 
statement? How might the "age” of a test influence the test scores 
of today's youth? 

6. Some attempts have been made to set up scales for the evaluation 
of tests on 100-point scales. In them 25 points may be given for 
validity, 20 points for reliability, ten for clarity of instructions, etc. 
Is any such system defensible? Why or why not? 

7. Select from your test library any widely used aptitude test other 
than one that has been considered in this chapter. Using the list of 
factors noted in this chapter, write your own evaluation of the 
test. When you have finished compare your report with the reviews 
of it in Buros’ Mental Measurement Yearbooks. What factors may 
have produced differences bc^vccn your evaluation and those of 
Uw reviewers in the yearbooks? 

8. Qioose five tests that arc recommended for use in counseling. If 
d^etc is a statement in ibrit manuals that scores may be used in 



Test Scores: Eiiology and Interpretation 149 

educational and vocational guidance, examine and report on the 
evidence offered to justify the statement. 

References 

American Educational Research Association. "Psychological Tests and 
Their Uses.” Review of Educational Research, February, 1947, 
17:1-128. 

American Educational Research Association, "Educational and Psy- 
chological Testing.” Review of Educational Research, February, 

1953, 23. 

American Educational Research Association. "Educational and Psy- 
chological Testing.” Review of Educational Research, February, 
1956; 26. 

Anastasi, Anne. Psychological Testing. New York: Macmillan, 1954. 
Buros, O. K. The Fourth Mental Measurements Yearbook. Highland 
Park, N.J.: Gryphon Press, 1953. 

Coleman, William, and Cureton, Edward E, '‘Intelligence and 
Achievement: The Fallacy' Again.” Educational and Pry* 

chological Measurement, SuxncMt, 1954, 14:347-351. 

Oonbach, Lee J. Essentials of Psychological Testing. New York: 
Harper, 1949. 

Crowder, Norman A. "The Holzinger-Crowder Uni-Factor Tests.” 

Personnel and Guidance fournal, January, 1957, 35:281-287. 
Doppelt, Jerome E. "Progress in the Measurement of Mental Abili- 
ties.” Educational and Psychological Measurement. Summer, 1954, 
14:261-264. 

Dreger, Ralph M. “Different IQ’s for the Same Individual Associated 
with Different Intelligence Tests.” Science, December, 1953, 
118:594-595. 

Durrell, Donald D, “Learning Difficulties Among Children of 
Normal Intelligence.” Elementary School fournal, December. 

1954, 55:201-208. 

Freeman, Frank S. Theory and Practice of Psychological Testing. 

(Revised Edition.) New York: Holt, 1955. 

Garrett, Henry E. "A Developmental Theory of Intelligence." 
American Psychologist, September, 1946, 1:372-378. 



)50 Measurement for Guidance 

Kelley, Truman L. Interpretation of Educational Measurements. 

Yonkers, N.Y.; World Book, 1927. 

Knezevich. Stephen. "The Constancy of the IQ of Secondary School 
Pupils.” journal of Educational Research, hfarch, 1946, 39:506- 
516. 

Lennon, Roger T. “A Compaiisoa of Results of Three Intelligence 
Tests/’ Test Service Notebook, No. 11. Yonkers, N.Y.: Division 
of Test Researdi and Service World Book (Undated.) 

Lindquist, E. F. (Ed.). Educational Measurement. Washington, 
D.C: American Council on Education, 1951. 

Otis, Arthur S. "An Absolute Point Scale for the Group Measure- 
ment of Intelligence.” Journal of Educational Psychology, May, 
1918, 9:239-261. 

Rulon, P. J. "On Concepts of Growth and Ability.” Harvard Educa- 
tional Review, Winter, 1947, 17:1-9. 

Schmidt, Louis G., and Rothncy, John W. M "Relationships Between 
Primary Mental Abilities Scores and Occupational Choices.” Jour- 
nal of Educational Research, April, 1954, 47;637-<?40. 

Stewart, Naomi, "AGCT Scores of Army Personnel Grouped by 
Occupation,” Occupations, Occobet, 1947, 26:5-41. 

Super, Donald E. Appraising Vocational Fitness. New York: Harper, 
1949. 

Super, Donald E. "The Tests of Primary Mental Abilities. Com- 
ments.” Personnel and Guidance Journal, May, 1956, 35:577-578, 
Super, Donald E, “The Holiinget-Ciowdet Um-Factor Tests. Com- 
ments,” Personnel and Guidance Journal, January, 1957, 35:287— 
288. 

Thorndike, Robert L., and Hagen, Elizabeth. Measurement and 
Evaluation in Psychology and Education. New York: Wiley, 1955. 
Tburstone, Thelma G. "The Tesb of Primary Mental Abilities.” 

Personnel and Guidance Journal, May, 1956, 35:569-577. 
Travers, Robert M- W. Educational Measurement. New York: Ivfac- 
millan, 1955. 

Traxler, Arthur E. Techniques of Guidance. (Revised edition.) New 
York*. Harper, 1957. 

Traxler, Arthur E. ’ Reliability, Owistancy, and Validity of the Otis 
IQ." Journal of Applied Psychology, 1934, 18:241-251. 



CHAPTER V 


The Use of Siandords in Test Selection 


Kefcrence ba5 been made earlier to the publicatioa io 19$4 of 
the Technical Recommendations for Ps 7 cho]ogica] Tests and Diag* 
nostic Techniques.^ It was pointed out in Chapter III that the 
publicatioa of these recommendations represented an important 
event in the area of measurement. The recommendations suggested 
that many tests had serious shortcomings and pointed to the way 
in which improvements might be nude. 

The need for standards of some kind was suggested by the 
authors of the recommendations in this statement. 

Professional workers agree that test manuals and associated aids to test 
usage should be made complete, comprehensible, and unambiguous, 
and for this reason there have always been informal "test standards.” 
Publishers and authors of tests have adapted standards for them- 
selves, and standards have been stated in textbooks and other publica- 
tions. . . . Until this time, however, there has been no statement 
representing a consensus as to what infomutioa is most helpful to a 
test consumer. In the absence of such a guid^ it is inevitable that 
some tests appear with less adequate supporting information than 

'i-Technical Recommendasiont for PjjcioloffeJ TesSi and Diagnostic Techniques. 
Supplement to the Psychological BuVetim, &laich, 1954, 51:1—38. 

151 





^52 Mca$uremenf for Guidance 

others of die same t)-pe, and that facts about a test whicli some users 
regard as indispensable have not been reported bcause Uiey seemed 
relatively unimportant to tlic test producer. Tliis report is tlie outcome 
of an attempt to suney the possible types of information tliat test 
producers might make available, to weigh tlie importance of these, 
and to make recommendations regarding test preparations and publi- 
cation.* 


hfany tests appeared before the thinking of theorists and profes- 
sional groups crystallized into a definite statement of standards 
for psychologial tests. The manuals accompanying these tests 
presumably represented what the authors and publishers regarded 
as adetjuate information about the tests and their use. They could 
not have been expected to anticipate all the iccommendations that 
appeared at a much later date because they represent the thinking 
of many persons and are the product of an assembly of measure- 
ment "talent" not available to individual publishers. Since they 
ate now available, publishers may well review all test manuals 
published before theTeclmical Recommendations appeared with a 
view to revision in the light of the new standards.* In any case it 
seems reasonable to expect that test manuals published after the 
recommendations became available should reflect the standards 
presented therein. Tlic degree to whicli this expectation has been 
met in the case of one such test and manual vvill be discussed in the 
following pages by making a direct comparison of statements taken 
from the test manual and from the standards in the Technical 
Recommendations report. 

llcfoic Ihc icit manual it cxaminca in the li^ht of the Rccom- 
mcnJitions a Jiicusjion of their do eloproent, scope, general or- 
ganiiatwn, and content *ill be presented. It may aid the reader in 


* U-J . p t. 

• Tt.i .. L, *.! uusiri „ n, „„ 

a" ’■'''““h da 



The Use of Standards in Test Selection 153 

following the analysis of a test and in gaining a better appreciation 
of the importance of the document itself. 

TECHNICAL RECOMMENDATIONS FOR PSYCHOLOGICAL 
TESTS AND DIAGNOSTIC TECHNIQUES 

ORIGIN 

The recommendations were designed to provide guides for the 
test author in development of tests, for the publisher in preparing 
and presenting data needed for effective use of tests, and for the 
person who must ultimately select and use them in various situa- 
tions. 

Thedocument was theproductofsuccessiverevisions of an original 
draft prepared by the Committee on Test Standards of the Ameri- 
can Psychological Association. It represents the cooperative efforts 
of this original group, a committee of the American Educational 
Research Association, and a committee of the National Council on 
Measurements Used in Education. These professional bodies con- 
stitute the major measurement groups in the United States and no 
document in any field could have a more authoritative source or 
sponsorship. 

DEVELOPMENT AND SCOPE 

The history of psychological testing covers a span of some fifty 
years or less. In that time, however, thousands of tests have been 
produced. Some of the variability in tjualiiy of tests may exist 
because the field of testing is still in relatively early stages of 
exploration and experimentation. Part might be accounted for by 
hasty attempts to meet competition in a highly competitive sales 
market, and some of it can certainly be attributed to the Jack of 
specific standards backed by strong professional support. 

While there has long been a general concern about the improve- 



\54 Measuremen* for Guidance 

ment of testing among professional persons in the held, the id^ 
of "standards’* has met with some resistance. This resistance is 
based, at least in part, on the grounds that "standards" might 
inhibit innovation and experimentation in test construction. Some 
of this early philosophy of test development is reflected in the 
following statement made by a committee of the American Psy- 
chological Association in 1906: "Let many tests be tried, each new 
investigator introducing his own modification, and then, the worth- 
less wfll gradually be eliminated and the fittest will survive.” 

Unfortunately, the worthless have not been eliminated. The 
staying power of some demonstrably inferior tests has been phe- 
nomenal. The members of the committee that prepared the recom- 
mendations in 1954 were not unaware of the fact that specifications 
for tests might discourage the development of new tests.' They 
believed, however, that "appropriate standardization of tests and 
matvuals need not interfere with irmovation." The lecommenda* 
tions were intended by the committee to provide assistance to the 
producers of tests "to bring out a wide variety of tests . . . and 
to make those tests as valuable as possible." 

The general principle or concept underlying the Technical 
Recommendations is that "a test manual should carry information 
sufficient to enable any rjualificd user to make sound judgment 
regarding die usefulness and interpretation of the test." 

The recommendations are significant in that they suggest, 
directly or by impHcation, the kinds and quality of data that must 
be gathered before a test is released for use. They emphasize the 
fact that a test manual should leave the test user with an accurate 
impression of the test that goes beyond literal truthfulness. Test 
manuals must be written in such a way that those with limited 
training will not get a distorted idea of what the test will do.‘ At 


* Ttttmeal RecommiaJathns, ef. p. X. 

ni to trfiuni™ the tuiotetioos troio the 
St ““ sod, suteoeet! 

lee.e .. thote he,, m «Kbed . hijh depee of «.5hi>ticetfoa dxtot 



155 


The Use of Sfandords m Test Sefecfion 

the same time, they "should be sufficientiy complete for specialists 
ifl the area to judge the technical adequacy of the test.” 

In preparing the standards, no attempt was roade to set statis- 
tical specifications as they relate to validity and reliability coeffi- 
cients, standardization, or other quantitative aspects of tests but 
the need for enough data on such factors to permit judgment of 
their adequacy is stressed.* The user must assume the responsi- 
bility for estimating the adequacy of the data presented before he 
employs it. He must decide whether they are sufficient in quah'ty 
and quantity. 

The standards presented in the document are intended to apply 
to virtually the whole range of measurement instruments, from 
achievement, ability, and aptitude tests through interest and per- 
sonality inventories, projective instruments, and related clim'cal 
techniques. While some standards apply to all such devices, others 
ace rather specific to particular types. In this respea the document 
recognizes several levels of test development. The highest degree 
of development, It is pointed out, is needed when tests are used in 
"practical” situations, where the user cannot obviously validate the 
test for his own purposes and must rely on the manual for data 
supporting the stated purpose and uses of the test. The recom- 
mendations ate directed primarily to tests falling in this category. 
These are the type most frequently employed by the counselor. Not 
all tests, of course, are of this type. The effective use of some types, 
such as the projective tests, are dep«ident upon the clinical in- 
terpretation of qualitative responses. Arguments to the contrary 
notwithstanding, the document bolds that these devices, too, should 
be accompanied by approprute evidence about validity, reliability, 
and other factors related to test interpretation. 

LEVELS OE RECOMMENDATIONS 

The Technical Recommendations is essentially a listing of 165 

• Technical 'RecommendaJiont, ep. at., p. 2. 



]56 Measurement for Guidance 

statemeats, each representing a standard related to some particular 
aspect of test presentation. The standards are subsumed under the 
general topics of Dissemination of Information, Interpretation, 
Validity, B.eUabiUty, Administration, and Scales and Norms. Each 
standard is further classified in terms of its relative implications 
for the operational use of a test. Thus, the individual standards are 
designated as essential, very desirable, or desirable, in importance. 
The categories are defined in the document as follows: 

The ESSENTIAL standards indicate what information will be 
genuinely needed for most tests in their usual applications. When a 
test producer fails to satisfy this need, he should do so only as a con- 
sidered judgment. In any single test, there will be very few ESSEN- 
TIAL standards which do not apply. ... A test manual can satisfy 
all the ESSENTIAL statvdards by dear statements of what research 
has and has not been done and by avoidance of misleading statements. 
The category VERY DESIRABLE is used to draw attention to types 
of information which contribute greatly to the user’s understanding 
of the test. They have not been listed as ESSENTIAL for a variety of 
reasons. For example, if it is very difficult to acquire information 
(e.g., long-term followup), it can not always be expected to ac- 
company the test. At times a dosely reasoned minority opinion 
regards a type of information as unimportant. Such information is 
still very desirable, since many users wish it, but it is not classed 
as essential so long as its usefulness is debated. 

The category DESIRABLE includes information which would be 
helpful, but less so than the ESSENTIAL and VERY DESIRABLE 
information.* 

The application of the standacds, reSecting each of these cate- 
gories as applied to one of the topics. Dissemination of Informa- 
tion, is presented in the following examples. It is regarded as 
essential, for instance, that: 

A2.2 When a test is revised or a new form is prepared, the manual 


» Wd., pp. y~6. 



The Use of Sfondords in Test Selection 157 

should be thoroughly revised to take the changes in the test into 
account.® 

Furthermore, it is regarded as very desirable that: 

A2.21 When a short form of a test is prepared by reducing the num- 
ber of items or organizing a portion of the test into a separate form, 
new evidence [should} be obtained and reported for that new form 
of the test.* 

It is desirable that: 

A2.22 When a short form is prepared from a test, the manual 
[should} present the correlation between the long and short forms, 
separately administered.^® 

There would be little debate about the fact that a revision of a 
test, implying new items, would necessitate treatment of standard- 
ization, validity, reliability, and norms as thoroughly as that re- 
quired for a new test. This being the case, many of the data in an 
existing manual would not apply to the revision and thorough re- 
working of the manual would be essential. In general, it could be 
reasoned that a short form of a test is a new test and hence it 
would be equally essential, rather than just very desirable, that the 
manual be revised. Within the framework of the document’s defi- 
nition of levels, however, the placing of the second standard in 
the category of very desirable may be appropriate. It is reasonable, 
too, that if sufficient data are presented for a short form of a test 
so it can stand alone, it would not be essential that the correlation 
between the long and short form be presented. Knowledge of the 
degree to which the two cosreiMe, however, might be heJpfuJ in 
making a decision as to which form to use in specific instances. 

Should the reader feel that the definitions of very desirable and 
desirable represent too much of a compromise, he may allay his 
fear in part by noting Table 19 below. Nearly three fourths of the 

® Ibid., p. 9. 

*!iid., p. 9. 

Ibid , p. 9. 



153 


Measuremeat for Guidance 


Table 16. Distnbotioa of Standards bf Topic and Category level * 


Topic 

Essentia) 

Categoiy 

Very 

desirable Desirable 

Not 

Categorized 

Total 

A. Dissetoinatioa of 






Infotioatloa 






B. InterpretitiOQ 

14 





C (geneiil) 












Predictive 

31 





Concurreat 






Ginstnict 


4 




D. Reliibility (genersl) 

11 

3 




Equivalence of forioi 






latecnal oaosUtcocy 

6 




' 

Stability 






£. Adiainuuation aad 






Scotiag 

4 

2 

1 




17 

8 

4 



Total 

114 

39 

11 

1 

163 


* Source*. Te«Kivieal Seecmmendotton* for FtvcKol^gical TetU and £(O 0 itofria 
reehn> 4 tfe«. Supplenent to tbe Pivchotogieal BuUetin, March. 19S4. 


total of 165 standards are categorized as essential. Those cate- 
gorized as very desirable (whidj cannot be too easily ignored by 
test authors and publishers in the future), in combination with 
essential, account for over 90 per cent of tbe total. 

It will be noted further that more than one third of the stand- 
ards relate to validity, the aspect of tests of most concern to the 
counselor. By far the greatest demands are placed on standards 
surrounding predictive validity, required most often in the coun- 
seling situation. 

LIMITATIONS 

While the Techiucal Recommendations represent an important 
milestone in the history of measurement, it must not be thought 
that all that remains is to have authors and publishers conform. 
The present document does not repte^nt the final word, nor does 
it represent a present ideal. It has limitations. 



159 


The Use of Sfandards in Test Selection 

Some o£ the limitations are those resulting from the very nature 
o£ psychological testing as a field. It is unlikely that the last word 
can ever be written in a field based on continuous research and 
experimentation and which produces new concepts, products, and 
applications. 

Other limitations are due, in a sense, to human factors. In spite 
of the fact that the committees authoring the document represent 
high authority in the field of testing, it is unlikely that this group 
or any group could anticipate all aspects of testing or the ques- 
tions and exigencies arising from the use of tests. 

Still other limitations relate to the element of compromise, both 
of professional opinion and between the ideal and the practical. 

These limitations are acknowledged in the document itself: "De- 
spite the care with which the standards have been developed, ex- 
perience will no doubt reveal &at some of our judgments would 
benefit from further examination. New tests will present problems 
not considered in the present work. The improvement of statistical 
techniques and psychometric theory will yield better bases for test 
analysis. The efforts of test producers will lead to continued im- 
provement in tests, and as this continues it will be possible to raise 
the standards.’’” 

Compromise is almost always inevitable when several groups 
attempt to evolve standards on which all can agree. That compro- 
mise played a part in the preparation of the Technical Recom- 
mendations is evident in the following: "In arriving at those 
requirements (as to inioanation acxompanying published tests), 
it has been necessary to judge what is presently the reasonable de- 
gree of compromise between pressures of cost and time, on the 
one hand, and the ideal on the other.” ” This compromise is re- 
flected in part by the levels that were evolved for the various 
standards, and is noted particularly in the explanation of the level 
termed very desirable: "At times, a closely reasoned minority opin- 

Ibid.^ p. 7. 

Ibid., pp, 2--3. 



^^0 Measurement far Guidance 

ion regards a type of information as unimportant. Such informa- 
tion is still very desirable, since many users wish it, but it is not 
classed as ESSENTIAL so long as its usefubess is debated.’* 

Not all that is known at this time about good test practices is 
included in the present standards. The standards related to the 
administration of tests, for instance, are minimal. If followed lit- 
erally and exclusively in a test manual, the test user would find 
that many questions related to administrative practices remained 
unanswered.’* 

That the standards are not as stringent as present knowledge 
could have them is suggested in another statement in the docu- 
ment: "Ideally manuals should be tested in the field by comparing 
typical readers’ conclusions with the Judgment of experts regard- 
ing the test. In the absence of such trials, our recommendations arc 
intended to apply to the spirit and tone of the manual as well as 
its literal statements.” 

One further Umltatloa relates to the problem of enforcement. 
This is not a limitation of the document itself, for there is obvi- 
ously no way to build enforcement into it, but is a problem of 
considerable importance. A proposal that a "Bureau of Test 
Standards” be planned in connection with Technical Recommen- 
dations was not favorably received at the time the American Psy- 
chological Association Committee on Test Standards was set up. 
The standards as presented are intended to be used without refer- 
ence to enforcement machinery. It is unfortunate, in one sense, 
that the consumer is not protected in the area of psychological 
testing as well as he is in the area of patent medicines and drugs. 
The test user, then, must still screen the tests he uses. The Tech- 
nical Recommendations wUl help him do the job. 

The Cooperative Sdrool and College AbiUty Tests were among 


IbiJ., p. 6. 

“.“f* AM//, T„U 

^ liiJ., p. 2. 



The Use of Standards tn Test Selection 167 

the first tests to be published after the recommendations were pub- 
lished. The remainder of this chapter is devoted to an evaluation 
of the Manual prepared to acajmpany this series- The standards 
presented in the Technical Recommendations appropriate to this 
type of test will be applied to determine the adequacy of the test. 
The procedure used in the following pages is one that counselors 
might well employ before they purchase a test, 

APPLICATION OF RECOMMENDATIONS TO THE COOPERATIVE 
SCHOOL AND COLLEGE ABILITY TESTS « 

The tests in the SCAT series are designed to help "teachers and 
counselors — and students themselves — to estimate the capacity of 
each individual student to undertake the academic work of the 
next highest level of schooling.” The series consists presently of 
tests suitable for use at five levels; college freshmen (Level I), 
senior high school (Level II), Grades 8, 9, and 10 (Level III), 
upper elementary (Level IV), and intermediate elementary (Level 
V) . A sixth level, suitable for superior college sophomores, is ten- 
tatively planned. 

The principal objective of the series, as stated in the Manual, is 
to provide continuity of measurement over a long range of years, 
extending from about the fourth grade through the sophomore 
year in college. The series, according to the Manual, should "make 
it possible to chart and study the growth of individual students 
over a range of years not now possible.” 

The SCAT tests were developed as an alternative to revision of 
the American Council on Education Psychological Examinations. 
Like the older ACE, the SCAT yields scores in verbal and quanti- 
tative areas and a total representing the combined scores.’^ The 
content and approach of the new test differ from that of the ACE, 

Ihe Coiiperathe School and College Ahduy Tejti. Viiacetotx, NJ., Coopecative 
Test Division, Educational Testing Seivi^ 193$. 

Ibid., p. 5. 



Measurement for Guidance 

howCTer, since the SCAT attempts to get at school-learned abilities 
diat are “critical prerequisites to next steps throughout the range 
of general education." These abilities are desaibed as "compre- 
hending the ‘sense* of a sentence read, attaching meaning to isolated 
words, manipulating numbers and applying number concepts accu- 
rately in a computation situation, and solving quantitative prob- 
lems.” 

The publishers have announced that they intend to "encourage 
the use of SCAT as the best we have to offer for measurement 
of academic ability." They recognize, however, that immediate 
djange-ovcr from the ACE will be neither practical nor desirable 
in some cases and they plan to mahe the ACE available at least 
until 1959. To aid in the transition, the publishers have equated 
the most recent college and high school editions to the SCAT score 
scale. This will make it possible for schools to change over, if they 
desire to do so, without Loss of valuable local norm data on the 
ACE that may have been collected over a long period. 

Some Bfteen uses for the tests as they relate to teaching, coun* 
seliog, and administration are listed on page 2 of the Afanual. 
Those related specifically to counseling follow: 

. . , when the tests arc used for their principal purpose, the counse- 
lor can apply the results in his work with students to: 

a. help the student to understand bis own strengths and weaknesses 
in comparison with students in certain norming groups; 

b. guide the studem toward choices of educational goals and courses 
most appropriate for him; 

C estimate the levels of achievement to be expected of the student; 
d. compare the measured amdemic abilities of students in different 
class, grade, and school groups. 

The extent to which these suggested uses are supported by ap- 
propriate vaUdity data as well as the extent to which the Jvfanual 
meets other standards of the Technical Recommendations will be 
noted in the analysis below. It is hoped that the method of analysis 



The Use of Standards in Test Selection 163 

applied here may be useful to tlrose counselors who are required 
to select tests.” 

A. DISSEMINATION OF INFORMATION 

TECHNICAL RECOMMENDATION 

Al. When a test is published for operational use, it should be ac- 
companied by a manual which takes cogciizuice of the detailed recom- 
mendations in this report. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABIUTY TESTS 

This initial recommendation appears to be met. Direct reference 
to the Technical Recommendations is made in the discussions of 
validity and reliability. The Manual points out clearly the limita- 
tions of the instrument. It contains discussions (page 9 of Man- 
ual) of "Inferences that validity data and arguments are intended 
to support," “Inferences of the kind which are at present sup- 
ported only by assumption of validity,” and "Inferences of the 
kind that are not supported by any evidence of validity, and which 
the user of the tests should avoid in every case." Such statements 
as "The usefulness of tests in the SCAT series as predictors of 
future school or college success is as yet only assumed,” further 
attest to the efforts of the publishers to meet the technical recom- 
mendations through acknowledgment of areas of weakness and 
limitations. 


TECHNICAL RECOMMENDATION 


Al.l Some form of manual, presenting at least minimum informa- 

The lette^avalber code that appeals ia Ihe puagraphs ol the analysis ue the 
ones used ia the Tecinieai Recemmendatiens to designate the specific standards 
quoted from the document There are gaps ia the letter-number designations be- 
cause all the standards do not apply to tests of this t)'pe. 



^64 Measurement for Guidance 

tion, should be given or sold to all purchasers of tlie test. ESSEN- 
TIAL. 


MANUAL OP THE SCHOOL AND COLLEGE ABILITY TESTS 

A Manual is provided in -which data pertinent to the test are 
presented in great detail. As with some other tests marketed in 
recent years, the Manual is not furnished with the test but must 
be bought separately. This seems to be justified since the 57-page 
Manual represents more expense to the publisher than the very 
brief and inadequate manual commonly ofieied. The tests them- 
selves ate accompanied by l4-page pamphlets containing directions 
for administration, scoring, and discussion of the interpretation of 
individual scores and group interpretations. 

TECHNICAL RECOMMENDATION 

A1.2 ''K'hece the infocmation is too extensive to be fully reported in 
such a manual, the manual should summarize the ESSENTIAL In- 
formation and indicate where further details may be found. ESSEN- 
TIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

As indicated in Al.l above, the Manual itself presents pertinent 
information in considerable detail. Since the test is new, no body 
of information regarding the tests, other than that reported in the 
Manual, has developed up to the time of this writing. 

TECHNICAL RETOMMENDATION 

A2. The manual should he up-to-date. It should be revised at ap- 
propriate intervals. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The SCAT series, at this writing, is new and is accompanied by 



The Use of Sfandards in Test Selecfion 165 

the initial Manual. The user o£ the test is promised, however, that 
there will be at least one revision of the Manual during the first 
two years of its use. Several supplements containing additional 
norm data and research findings are also planned. One supplement 
appeared in 1959. The purchaser is advised that these revisions and 
supplements will be sent to him automatically and without charge. 
A tearout postal card is provided contaim'ng this statement: "Re- 
turn of this service card, properly filled in, entitles you to contin- 
uous and automatic supplement service through I960 without 
additional cost. Please send it now." This approach, used also in a 
few earlier tests, is commendable — a practice that all test publish- 
ers would do well to follow. 

TECHNICAL RECOMMENDATION 

A2.1 When new information emerges from investigations by the test 
authors or others, which indicates that some facts and recommenda- 
tions presented in the manual are substantially incorrect, a revised 
manu^ should be issued at the earliest feasible date. ESSENTIAL. 

MANUAL OP THE SCHOOL AND COLLEGE ABILITY TESTS 

This recommendation, at present, is not relevant. The promise 
of a new, revised manual within two years and periodic supple- 
ments containing additional research data, however, suggests that 
it is the intent of the publisher to keep the user abreast of the latest 
data, and to satisfy this recommendation. 

TECHNICAL RECOMMENDATION 

A2.3 The copyright date of the manual or the date of the latest 
revision should be clearly indicated. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

This standard is met. Two supplements containing additional 
norm data are similarly dated. 



144 Measurement for Guidance 

tioQ, should be given or sold to all purchasers of the test. ESSEN- 
TIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

A Manual is provided in which data pertinent to the test are 
presented in great detail. As with some other tests marketed in 
recent years, the Manual is not furnished with the test but must 
be bought separately. This seems to be justified since the 57-page 
Manual represents more expense to the publisher than the very 
brief and inadequate manual commonly offered. The tests them- 
selves are accompanied by, l4-page pamphlets containing directions 
for administration, scoring, and discussion of the interpretation of 
individual scores and group interpretations. 

TECHNICAL RECOMMENDATION 

Al.2 \7hece the Information is too extensive to be fully reported in 
such a manual, the manual should summarize the ESSEKllAL in- 
fornution and indiate where further d^ls may be found. ESSEN- 
TIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

As indicated in Al.l above, the hlanual itself presents pertinent 
information in considerable detail. Since the test is new, no body 
of information regarding the tests, other than that reported in the 
Manual, has developed up to the time of this writing. 

TECHNICAL RECOMMENDATION 

A 2 . The marccci sVwscid be up-to-date. It should be revised at ap- 
propriate intervals. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The SCAT series, at this writing, is new and is accompanied by 



The Use of Standards in Test Selection 165 

the initial Manual. The user of the test is promisttJ, however, that 
there will be at least one revision of the Manual during the first 
two years of its use. Several supplements containing additional 
norm data and research findings are also planned. One supplement 
appeared in 1958. The purchaser is advised that these revisions and 
supplements will be sent to him automatically and without charge. 
A tearout postal card is provided containing this statement: “Re- 
turn of this service card, properly filled in, entitles you to contin- 
uous and automatic supplement service through I960 without 
additional cost. Please send it now." This approach, used also in a 
few earlier tests, is commendable — a practice that all test publish- 
ers would do well to follow. 

TECHNICAL RECOMMENDATION 

A 2.1 When new information emerges from investigations by the test 
authors or others, which indicates that some facts and recommenda- 
tions presented in the manual are substantially incorrect, a revised 
manual should be issued at the earliest feasible date. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

This recommendation, at present, is not relevant. The promise 
of a new, revised manual within two years and periodic supple- 
ments containing additional research data, however, suggests that 
it is the intent of the publisher to keep the user abreast of the latest 
data, and to satisfy this recommendation. 

TECHNICAL RECOMMENDATION 

A2,3 The copyright date of the manual or the date of the latest 
revision should be clearly indicated. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

This Standard is met. Two supplements containing additional 
norm data are similarly dated. 



Measurement for Guidance 


U6 

B. INTERPRETATION 

TECHNICAL RECOMMENDATION 

Bl.l Names given to tests, and to smres within tests, should be 
chosen to minimize the risk of mismterpretation by test purchasers 
and subjects. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The test title, School and College Ability Tests, and explana- 
tions of intent are evidence that this recommendation has been 
met. Frequent mention is made of the basic purpose of the instru- 
ment— that of "helping teachers and counselors — and students 
themselves— to estimate the capacity of each individual student to 
undertake the academic work at the next higher level of schooling. 
The tests ate measures of developed ability, indicative of the rela- 
tive academic success the student is likely to achieve in his next 
steps up the educational ladder.” Considerable discussion is pre- 
sented regarding "school-learned abilities.” Those regarded as 
most critical for the purpose at hand are clearly named and were 
used in evolving the final items. Ore too has been taken in point- 
ing out that the test scores do not indicate the ’intelligence’ or 
'native capacity of the student,’ that the abilities measured are not 
to be interpreted as fixed or permanent characteristics of the stu- 
dent, and that the scores on the tests do not, at this time, indicate 
an individual’s likelihood of success in vocational training or in 
certain occupations. Further compliance with this recommendation 
is found in the Manual’s discussion of "Interpretation of Individ- 
ual Scores and Group Distributions” (p. 52 App. A). "The Coop- 
erative School and College Ability Tests are intended to measure 
four of the school-learned skills which research has shown to be 
closely related to academic success in school and college. 'These 
tests arc NOT measures of ’intelUgence’ in the sense that they 

Ibid., p. 3. 



The Use of Standards in Test Selection 167 

tap innate psychological characteristics, nor are they really meas- 
ures of ‘aptitude’ because aptitudes usually are regarded as fairly 
stable characteristics not greatly affected by instruction.” 

This statement, unfortunately, is somewhat buried in the Man- 
ual. It might well have been given a more prominent place, most 
desirably in the o^tn'mg discussion of purposes and uses. In an- 
other section related to interpretation, the user is cautioned against 
overinterpretation (p. 20) . 

Even though the tests in the SCAT series have been constructed to 
yield scores of high reliability and the directions are quite specific as 
to what is measured and what is not measured, users of the tests are 
cautioned not to over-interpret the test scores. The tests measure well 
the things they are intended to measure, but educators who are not 
extensively trained in educational and psychological measurement 
often are tempted to "read into” their interpretation of any test scores 
some conclusions which the scores will not support. For this reason, 
the section of the “Directions” which contains suggestions on in- 
terpretation specified some of the things the tests do NOT meas- 
ure. . . . 

Thus the user is cautioned frequently in the Manual regarding 
interpretation and limits thereof. Here again the publishers must 
be commended for the straightforward marmer in which the pur- 
poses and scores of the tests are presented. There appears to be no 
attempt directly or by implication to represent the instrument as 
something it is not. 

TECHNICAL RECOMMENDATION 

BI.2 The manual or other accompanying material should describe 
the process by which interpretations are to be derived from test scores. 
VERY DESIRABLE. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITV TESTS 


If one could assume that tests would be used only by those who 



]48 Measuremenf for Guidance 

automatically consider other data in addition to test scores there 
■would be no argument about the present practice of suggesting to 
the user that the instrument at hand is something of a panacea for 
all the frustrations, questions, and problems arising from working 
with other people in an educational or counseling situation. But 
diis is a gross assumption. Considering the wide use and misuse of 
psychological tests, and the fact that they may be obtained and 
used by many who have no particular training or background in 
testing, it would appear essential that all test manuals include 
prominently a statement reminding the test user of the dangers of 
interpretation in isolation.They must encourage consideration of the 
factors that may have influenced the test performance and the types 
of data one might seek for evidences of contradiction. The SCAT, 
in terms of this standard, is only a slight improvement over those 
tests marketed prior to the appearance of the Technical Recom- 
mendations. There are a few token statements, however, that may 
suggest to the user that other factors must be taken into account. 
The following statement from page 8 of the Manual is made in 
the discussion of validity: "The test appears to be a 'work sample' 
measuring the ability of the students in several of the skills that 
are important to further academic learning. If these were the 
only abilities needed for success in school, and if the criterion of 
teacher-assigned grades were a reliable one, the validity coefBcients 
of a highly reliable test would be very close to 1.00. There are 
many other factors also important to school success, however — 
other academic abilities as wdl as habits and attitudes. . . ." 

That other factors need to be taken into account is also implied 
indirectly in the following discussion of norms, which appears on 
page 33 of Appendb: A of the Manual. 

A more useful set of norms can be built locally ... and are more 
useful than national norms” for most interpretations because they 
reflect the scores of students who are much like the student notv being 
tested in a number of ways; (a) they ate at about the same age in 



The Use of Standards in Test Selection 169 

each grade because they enter school and are promoted from grade to 
grade under a common system; (b) they are studying in the same 
curriculum; (c) they come from generally the same kinds of homes; 
(d) the over-ail quality of their instruction is about the same; (e) 
the cultural and academic advantages offered by the community ace 
similar; and (f) the school services available to them are the same. 
All of these things affect the test performance of students in some 
way. . . . 

The SCAT user, then, is reminded, at least indirectly, of some 
of the factors that need to be taken into account in interpreting 
the test results. It would be desirable to have these points, and 
others, stated more directly under a separate heading, but their 
very presence in any connection is encouraging. 

TECHNICAL RECOMMENDATION 

Bl.22 When case studies are used as illustrations for the interpreta* 
tions of test scores, the e>:ample$ printed should include some 
relatively complicated cases whose interpretation is not clear-cuL 
VERY DESIRABLE 

MANUAL OF THE SCHOOL AND COLLEGE ABJUTY TESTS 

This recommendation, as with the one previous, is regarded as 
Very desirable rather than essential. In a sense, then, the Manual 
should not be required to meet this particular standard. Since the 
Manual does present an illustrative case, however, some comment 
seems appropriate. The case does not appear to be particularly 
complicated, and the interpretation is somewhat perfunctory, 
though it may aid the test user in some aspects of interpretation. 

It does tend to emphasiae one of the strongest features of the 
Manual, the need to interpret test scores in terms of ranges and 
confidence intervals. Again, however, the illustrative case is con- 
fined almost exclusively to test results and fails to introduce non- 
test data that may be of consequence in counseling. The value of 



170 


Measurement for Guidance 

the case would be greatly enhanced by having results integrated 
with other counseling data, since that is the way the results pre- 
sumably would be used. The ultimate implications of test results 
are not discussed in any detail. 

TECHNICAL RECOMMENDATION 

B2. The test manual should state explicitly the purposes and applica- 
tions for which the test is recommended. ESSENTIAL 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The statement of purposes of the test presented in the Manual 
is typical of most such statements. It is interesting to note, how- 
ever, that the Manual, in presenting the possible applications of 
the tests, suggests that they can "aid” the teacher, that the coun- 
selor can ‘'apply” the results, and that the data from the test results 
can "help” the administrator. This is a refiesbmg contrast to many 
test manuals which state rather unequivocally that the tests will do 
certain things rather than indicate that they may help in doing 
them. The limitations implied by the verbs employed, however, 
are likely to be overlooked by many users. Greater emphasis could 
be given to this point. Though perhaps not wholly relevant here, 
it appears to be assumed that the test user will have the know-how 
required to use the results in the various ways suggested. It would 
be helpful to have the stated uses supplemented with examples. 

TECHNICAL RECOMMENDATION 

B3. The test manual should indicate the professional qualifications 
teemed to adminvstet and aAespiet Tea-t picjpeily . ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The SCAT tests, it is pointed out in the Manual, were "devel- 
oped and arranged in the expectation that they will be adminis- 



The Use of Standards in Test Selection 1 71 

tered, scored, and interpreted mostly by teachers who have had 
little or no formal training in testing.” It is believed by the authors 
that any teacher or counselor who takes time to study the directions 
and follow them will do as well as a test expert This seems rea- 
sonable, of course, since a test expert could do no more. As indi- 
cated in El below, the treatment of directions for administering 
the test is very thorough. It is believed by the publishers that coun- 
selors and others with training in measurement should have little 
difficulty in using and interpreting the test. 

Some control over use of the SCAT is attempted through limi- 
tations of purchase established by the publisher. "Administrators, 
college teachers, and professionally qualified advisors in reojg- 
ciaed soJuxrfs . . . /efferAeadocparcfiase 

order forms. High school teachers can similarly purchase the tests 
with written approval of their administrators, graduate students 
with approval of their instructors. Staffi members of other orgam*- 
zations and individuals in private practice having a master’s de- 
gree in psychology or education, or equivalent in training and 
experience can purchase upon presenting their qualifications. The 
company reserves the right to accept or reject orders ... in con- 
formity with professional standards." (Manual, page 57.) 

TECHNICAL RECOMMENDATION 

B3.ll The fnaniial should not imply that the test is "self-interpret- 
ing," or that it may be interpreted by a person lacking proper train- 
ing. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESl^ 

The Manual does not imply in any sense that the test is self- 
interpreting. It does imply that teachers in general should be able 
to interpret the tests by reading the Manual. It is somewhat incon- 
sistent, however, in suggesting as above, that those with training 



172 


Measurement for Guidance 

m meKUiemcnt should have no difficulty in interpreting the test, 
but those "who have not acquired extensive training in measure- 
ment” should follow the Manual in "cookbook” fashion. While 
the manuals should be as complete as possible, it seems unlikely 
that they will ever be detailed enough so that a straight cook- 
book” approach will be sufficient. Perhaps much of the misuse of 
tests can be attributed to the attempts on the part of test publishers 
to make tests simple and to imply that anyone who can read can 
also give tests. While more cautious about this point than the man- 
uals of many other tests, the cause of testing might better have 
been served by uiging that alt users be trained in measurement. 

TECHNICAL RECOMMENDATION 

B3.12 The manual should point out the counseling responsibilities 
assumed when a tester communicates interpretations about ability or 
personality traits to the person tested. ESSENTIAL. 

MANUAL OP THE SCHOOL AND COLLEGE ABILITY TESTS 

The value of any test would be greatly enhanced if the manual 
would remind the user of the implications of the results, and that 
other data must be considered when interpreting results to an indi- 
vidual. The user must be reminded that test results should not be 
interpreted in isolation and that the student should not be left in 
a state of suspension. The counselor must be- sure that the coun- 
selce is following his interpretation and that he has a grasp of per- 
centile or other statistical wncepts that may be used. In this regard, 
the SCAT Manual leaves room foe improvement. If the authors 
assumed some of &ese points, this is not made clear in the Man- 
ual. It does not appear to contain any direct acknowledgment in 
the form of discussion of this standard. One of the stated counsel- 
ing uses of the test, that of helping the students "to understand 
his own strengths and weaknesses . . . implies interpretation 



The Use of Stondords in Test Selection 173 

of resales to the student. While no direct comment is made as to 
how this is to be done, there is nothing in the Manual that should 
lead the test user to believe students can or should make their own 
interpretations. 


TECHNICAL RECOMMENDATION 

B5. Statements in the manual reporting relationships are made by 
implication quantitative, and should be stated as precisely as the data 
permit. If data to support such a statement have not been collected, 
that fact should be made clear. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

What this standard appears to demand is some expression of 
the degree to which (in this case) further academic learning at a 
given level depends upon the abilities measured by the test. If one 
expected to find a statement in the Manual to the effect that "90 
percent of academic success is due directly to the factors measured 
by this test," he will not find it. The problem of relationship of 
test scores to academic performances has long confronted test au- 
thors, counselors, teachers, registrars, and others. The SCAT does 
not appear to provide, at this time, any better solution to the prob- 
lem than other tests. The authors do not, however, claim perfect 
correlation between SCAT scores and further academic learning. 
They state only that it measures "several skills that are impor- 
tant." The validity coefficients presented are not significantly 
higher than those obtained previously by use of other test?. Results 
of further studies in progress are promised to those who use the 
test as soon as they are known. 

TECHNICAL RECOMMENDATION 

B5.2 The manual should dearly differentiate between an interpreta- 
tion justified regarding a group taken as a whole, and the application 



■J74 Measurement for Guldonca 

of such an interpnaation to each individual -within the group. ESSEN- 
TIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

One of the strong features of this test is the emphasis placed on 
careful interpretation and the caution against overinterpretation. 
This is strengthened by the introduction of the "confidence inter- 
val" concept. The reader is urged to interpret scores as a * band of 
possible test scores" rather than as an exact point. The standard 
error of measurement is built into the scoring and recording pro- 
cedure, It places more usable limits on individual score interpre- 
tation than those afforded by the coefficients of reliability that are 
usually presented in test manuals. The careful reader will note the 
explanation that the interpretation of this approach is that "the 
chances are two-to-one that the student’s 'true score’ lies within this 
mterval." This point could be strengthened, of course, by a state- 
ment that there is still one chance in three that the student’s true 
score would fall beyond the interval. This standard is acknowl- 
edged in another statement on page ll of the Manual. "The user of 
the School and College AbH/ty Tests cannot be leoiinded too em- 
phatically that the scores of individuals on the four subtests should 
NOT be interpietcd separately. The part scores and total scores 
, . . Are teluble enough £oi indwiduaV use, but separate subtKt 
scores should never be recorded for individoais." {Italics theirs.) 

C VALIDITY 


TECHNICAL RECOMMENDATION 

Cl. When validity is reported, the manual should indicate dearly 
what type of validity is referred to. The luiqualtfied term "validity" 
should be avoided unless its meaning is dear from the context 
ESSENTIAL. 



The Use of Standards in Test Selection 175 

MANUAL OF THE SCHOOL AND COLLEGE ABIUTY TESTS 

The SCAT Manual, in presenting validity data, quotes directly 
from the Techm'cal Recommendations and organizes its material 
around the four types of validity defined in the document as con- 
tent, predictive, concurrent, and comtruct. 

TECHNICAL RECOMMENDATION 

C2. The manual should report the validity of each type of inference 
for which a test is recommended. If validity of some recommended 
interpretation has not been tested, that fart should be made clear, 
ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILHY TESTS 

The manner in 'which this standard is met by the SCAT Manual 
is, in general, good. The Manual prwents, in a specifically titled 
section, the inferences that presumably are supported by data and 
expert opinion, those that are supported by assumption of validity, 
and inferences "of the kind which are NOT supported by any 
evidences of validity." The Manual cautions the user to avoid 
these latter in every case. The Manual offers three inferences that 
the arguments and data are intended to support. One is based on 
"expert opinion," one on estimated concurrent validity coefficients, 
and one on a combination of the two. It is commendable that the 
Manual makes a clear distinction between the inferences. It is well, 
too, that the test was conceived with the help of much expert opin- 
ion. This makes a good starting point. It is unfortunate that the 
authors of the test, at the time of its introduction, did not present 
more adequate statistical evidence in support of the inferences pre- 
sumably based on the expert opinion and on concurrent validity 
coefficients. 



176 


Measurement for Guidance 

TECHNICAL RECOMMENDATION 

C2.1 The manual should indicate which, if any, of the interprcUtions 
usually attempted for tests such as the one under discussion have not 
been substantiated or are based merely on clinical impressions. 
ESSENTIAL. 

MANUAL OP THE SCHOOL AND COLLEGE ABILITY TESTS 

In addition to advising the user regarding inferences that the 
test is and Is not intended to support, the Manual clearly states 
that predictive validity "is as yet only assumed" and that con- 
current validity, as reported, is estimated (on the basis of ex- 
perimental tests) and that further studies ate under way. In the 
light of these statements, this standard is adequately met. 

TECHNICAL RECOMMENDATION 

C3. Findings based on logical analysis should be carefully dis- 
tinguished from conclusions established by correlation of test behavior 
with criterion behavior. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

There appears to be no attempt to present findings based on 
logical analysis alone, that is, where items representing specific 
areas were assembled and validity assumed on the efficacy of the 
items themselves. The evidence of content validity is rather based 
on a combination of factors (including concurrent validity) as 
noted in the following statement: 

Thus, by logical and empirical means, there were developed and tried 
out a series of practical measures of school-learning ability having the 
following characteristics in content validity: 

a. Measurement of developed abilities rather than innate psychologi- 
cal traits; 

b. Measurement of abilities whidi a committee of noted educational 



The Use of Standards in Test Selection 177 

researchers recommended as most ciosely related to success in 
school learning; 

c. "Face validity” in the sense that students and parents can see in 
the test content a measurement of Abilities related to school learn- 
ing; 

d. Relatively high correlations with school marks assigned by 
teachers; 

e. Sufficiently low intetcorrelitioas between the scores on the verbal 
and quantitative parts of the test to indicate measurement of 
somewhat different abilities.*® 

Logical analysis was employed in the initial selection of the ex- 
perimental test content, but conclusions regarding content validity 
are based heavily on the degree to which test types correlated with 
class grades in the areas they were supposed to measure. None of 
the content validity data is based on the final form of the test. 
This point is made on page 6 of the Manual as follows: "Although 
these data, obtained with experimental forms of the test, are the 
only concurrent validity coefficients offered at the time of publica- 
tion, it is likely that they ate minimum estimates.” {Italics theirs.) 

It is unfortunate that the data are not based on the final form 
of the test, that the user does not have actual rather than esti- 
mated coefficients, and that he does not have some evidence that 
they are minimum. At the same time, it is commendable that much 
of the discussion of content validity data is couched in tentative 
rather than positive terms where data are lacking. If the reader 
will heed such statements as "these characteristics suggest that the 
test probably will have sufficiently high content validity to make it 
useful . . (iSalics added) he will perhaps exercise appropriate 
caution when using the test and save his enthusiasm for the time 
when more positive evidence becomes available. 

TECHNICAL RECOMMENDATION 

C4. If a test performance is to be interpreted as a sample of perform- 

** Ibid., p. 8. 



178 


Measurement for Guidance 

ance in some universe of situations, flie manual should indiate 
dearly what universe is represented and bow adequate the sampling 
is. ESSENTIAL 

manual of the school and college aeilitv tests 

This standard is most applicable to tests of achievement or pro- 
fidency in some subject area, e.g., mathematics, reading. The SCAT 
is not intoided to be a test of this kind. At the same time, almost 
all tests are samples of a performance in some universe of situa- 
tions and in this case the uiuvetse appears to be "school-learned 
abilities" or skills. The four subtests comprising the final form of 
the test were saeened by statistical and logical analysis from nine 
"abilities" recommended for such screening by the advisory com- 
mittee. These nine abilities constituted in one sense the "universe" 
sampled by the final test. They included Kesourceful Computation, 
Reading Comprehension, Sentence Completion, Analogies, Routine 
Computation, Data Sulfidency, Vocabulary, Mixed Computation, 
and Arithmetic Reasoning. MThile certain performances may be 
inferred by these test types or abilites, no specific description of 
their content is offered. Since the final form of the test consiste 
of only four of the nine abilities (Routine Computation, Arith- 
metic Computation, Sentence Completion, and Vocabulary) , only 
a partial sample appears to have been taken. This limitation of 
sampling is recognized in the Manual, however, and the test is not 
represented as measuring ail the school-learned abilities needed in 
further academic learning. 

TCatNlCAL RECOMMENDATION 

C4.1 The universe of content should be defined in terms of the 
sources from which items wcic drawn, or the content criteria used to 
include and exclude items. ESSENTIAL 

01.2 The method of sampling items widun the universe should be 
desaibed. ESSENTIAL 



The Use of Sfandords in Test Selection 


179 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Here again the standard applies most obviously to tests of sub- 
ject matter achievement or proficiency, but has some bearing on the 
SCAT. The Manual indicates only that "test items were designed 
to measure these nine abilities and a test of each type was built 
for experimental use” (Manual, p. 5). 

A 30'item vocabulary subtest is included in the final form, but 
no mention is made in the Manual as to the source of vocabulary 
items. Arithmetic reasoning and routine computation items are 
used in two of the subtests, but again, neither the source nor the 
criteria for selecting the items are presented. The reader must as- 
sume that these items are the most suitable for measuring the 
"abilities” that the test is designed to sample. The criteria used to 
select the best combination of test types for inclusion in the final 
form of the test are presented in the Manual. 

TECHNICAL RECOMMENDATION 

Predictive validity. Of 67 recommendations relating to all 
types of validity, 47 related specifically to the predictive type. They 
are designed also to cover concurrent validity. Because the SCAT 
manual does not present evidence of predictive validity (see accom- 
panying comment), no direct analysis can be made. Some of the 
recommendations relevant to predictive validity are presented in the 
analysis which follows. 

MANUAL OT THE SCHOOL AND COLLEGE ABILITY TESTS 

No direct evidence of predictive validity is offered the SCAT 
user in the first Manual. This is the case in spite of the fact that 
one of the major purposes of the test (Manual, p. 3) is predictive 
in nature — "to estimate the capacity of each individual student to 
undertake the academic work of the next higher level of school- 



]80 Measurement for Guidance 

hig.” The absence o£ predictive validity data precludes at least one 
o£ the suggested counselor uses* to "guide the student toward 
choices of educational goals and courses most appropriate for him 
(Manual, p. 2). This is not to say, however, that the Manual ig- 
nores the need for such data. It acknowledges the problem of 
securing data on predictive validity in the following paragraph 
on page 8. 

The usefulness of tests in the SCAT series as predictors of future 
school or college success is as yet oidy asiumed- It is reasonable to 
expect tliat measures of this type having respectable concurrent 
validity will also predict individual success well enough to be useful 
for prognostic purposes, for tests of these kinds have proved to be 
predictive in other forms and uses, but data on predictive power must 
be collected over a period of time and the prediction studies of these 
tests have not been completed at the time the series is first published. 
Prediction studies were initiated at each of several grade les’cls as 
soon as the final content of the tests had been determined; tlie results 
of these studies will be added to the manual in supplements as soon 
as they are known. Predictive validities for varying periods of time — 
from eight months to four or five years — and for different educational 
aiteria will be reported. 

Promises of things to come are of little comfort to the user who 
needs the evidence nouf if be is to use the test for predictive pur- 
poses. While one can deplore the absence of the data, the authors 
of the Manual must be commended for their treatment of the 
point. The need and absence are at least pointed out to the user. 
The authors could have as easily ignored the point, hoping that 
the users would, too. 

Concurrent validity. On page 19 of the Manual one finds 
the following statement regarding concurrent validity: 

The validity coefficients presented in the section on "content validity" 
are estimated coefficients of concurrent validity. That is, they are the 
statistically combined concurrent validities of the experimental test 
types tliat were used to determine the final forms in the series. 



181 


The Use of Sfandords in Test Selection 

Although these estimates can be regarded as accurate and useful for 
most practical purposes, further studies of the concurrent validity of 
the final forms are under way. The results of these studies, co nfirm - 
ing and extending the original dahi, will be reported in an early 
supplement to the manual. 

Thus the user is asked to accept the evidence of concurrent valid- 
ity on the experimental tests. While it is probable that further 
studies on the final form of the test will confirm the findings of 
the experimental tests, it is unfortunate that the user does not have 
actaa/ evidence to help him estimate the usefulness of the tests. 
In reading the following analysis of concurrent validity, the reader 
will keep in mind that the data are based on estimated coefficients. 

TECHNICAL RECOMMENDATION 

C3.1 Statistical procedures which are well known and readily inter' 
preted should be used in repotting validity whenever they are appro- 
priate to the data under examination. Any uncommon statistical 
techniques should be explained. ESSENTIAL. 

C5.ll Reports of statistical validation studies should ordinarily be 
expressed by: (a) correlation coefficients of familiar types; (b) 
description of the efficiency with which the test separates groups, 
indicating amount of misclassification or overlapping; (c) expectancy 
tables. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Coefficients of correlation between test scores and grades are 
presented, but the method of computation is not described. It is 
probable that the usual Pearson correlation technique was em- 
ployed, but this can only be assumed. 

TECHNICAL RECOMMENDATION 

C5,2 An over-all validity coefficient should be supplemented with 



182 Measurement for Guidance 

ci'idence as to the validity of the test at dificreot points along the 
range, unless the author reports that the validity is essentially constant 
throughout. VERY DESIRABLE. 

MANUAL OF THE SCHOOL AND COLLEGE ABlLtTV TESTS 

This standard is not met by the SCAT Manual- 

TECHNICAL RECOMMENDATION 

C3.3 Test manuals should not report coe^dents corrected for unrelia- 
bility of the test as estimates of predictive validity. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

There is no evidence that the reported coefficients have been 
corrected for attenuation. 

TEaiNICAL RECOMMENDATION 

C6. All measures of criteria should be described accurately and in 
detail. The manual should evaluate the adequacy of the criterion. It 
should draw attention to significant aspots of performance which the 
criterion measure does not reflect and to the irrevelaot factors which 
it may reflect ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The criteria used in determining validity coefficients were total 
grade averages, English grades and mathematics grades in Grade 
9, total grades, and English grades in Grade 12, The coefficients 
were computed from scores obtained from students in 19 high 
schools located in eight different states. The schools were divided 
into High, Medium,’* and "Low” categories on the basis of 
per-pupil investment in education by the community. The actual 



The Use of Siandords in Test Selection 183 

differences between these categories are not described. The Manual 
does not offer a description of the size of the schools, their gen- 
eral philosophy, or objectives, nor is the nature of the classes in 
English or mathematics presented. The test user will have few 
data upon which to base judgment as to the representativeness of 
the schools included. 

TECHNICAL RECOMMENDATION 

C6,5 The time elapsing between the test and determination of the 
criterion should be reported. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The exact time lapse is not given in the Manual, though it is 
stated that the validity coefficients ate based on grades earned dur- 
ing the same semester in which the experimental tests were given. 
It seems unlikely that short lapses of time would affect the results 
greatly in a test of this kind, as might be the case in achievement 
tests where much learning may take place during the time lapse. 

TECHNICAL RECOMMENDATION 

C7. The reliability of the criterion should be reported if it can be 
determined. If such evidence is not available, the author should 
discuss the probable reliability as judged from indirect evidence. 
VERY DESIRABLE. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The reliability of school marks used in the determination of the 
validity coefficients is not reported, but their unreliability is con- 
ceded. The determination of reliability in each of the schools 
would in itself be something of a major undertaking. It is unlikely 
that such data, if gathered, would alter the generally known fact 



]84 Measurement for Guidonce 

that grades are unreliable. The Advisory Committees recommen- 
daUon that the test should measure school-learned abilities was 
based in part on the observation "that the best single predictor of 
how well a student is likely to succeed in his school work next year 
is bow well he is succeeding this year” (Manual, p, 5). Grades, 
however unreliable in general, appear to be our best evidence as to 
how well the student is doing in school. 

TECHNICAL RECOMMENDATION 

C8. The date when validation were gathered should be reported. 

essential. 

MANUAL OF THE SCHOOL AND COLLEGE ABILTTY TESTS 

This is given only as 1933. As above, validation date on a test 
of this type is not likely to change radically over a period of a year 
or so, but it might over a longer period as curric ular changes are 
made. 


TECHNICAL RECOMMENDATION 

C9. The criterion score of a person should be determined inde- 
pendently of his test score. The manual should describe precautions 
taken to avoid cootaminatioa of the criterion or should warn the 
reader of any possible contamination. ESSENTIAL. 

C9.I ^^en the criterion consists of a rating, grade, or classification 
assigned fay an employer, teacher, psydiiatrist, etc, the manual must 
state whether the test data were avaihd>le to the rater or were capable 
of influencing his judgment in any way, e.g., indirectly through other 
reports of the psychologist ESSENTIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The Manual is careful to note that grades were assigned to the 



The Use of Sta/idords in Test SelecHon ]85 

students involved in the validation study before the teachers knew 
their test scores. 


TECHNICAL RECOMMENDATION 

Cl3. The validation sample should be described sufficiently for the 
user to know whether the persons he tests may properly be regarded 
as represented by the sample on which validation is based. ESSEN* 
TIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

See C6 above. 


TECHNICAL RECOMMENDATION 

Cl 3.1 The user should be warned against assuming validity when the 
test is applied to persons unlike those in the validating sample. 
ESSENTIAL. 

C13.3 The number of cases in the validation sample should be re- 
ported. The group should be desaibed in terms of those variables 
known to be related to the quality tested: these will normally include 
age, sex, socioeconomic status, and level of education. Any selective 
factor which restricts or enlatgcs the variability of the sample should 
be indicated. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Such a warning is not given directly, but may be implied in a 
discussion of the desirability of local norms. After listing the ways 
in which students may differ in terms of educational opportunity 
the following statement (C13.3) is made on page 33 of Appendix 
A of the Manual. 

“All these tilings affect the test performance of students m some 
way, so that to estimate how well a student is doing m'/b what he 



^84 Measurement for Guidance 

has to Start with it is important to compare him with others who 
have approximately the same start:' This is a rather hollow stand- 
ard in the case of the SCAT, however, for the persons in the vali- 
dating sample are not desctihcd by age, range, or sex. and the 
implications of levels of "investment in education by the com- 
munity" are not indicated. The number of cases is reported, 

TECHNICAL RECOMMENDATION 

Cl7. Reports of concurrent validity should be so described that the 
reader will not regard them as establishing predictive validity. ES- 
S^IAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The discussions of the various types of validity are bandied 
separately in the organiaation of the Manual. Some confusion does 
result from the inclusion of some concurrent validity data with 
that of content validity. Concurrent and ptediaive validity data, 
however, are adequately separated. 

Construct validity. Construct validity as a type is acknowl- 
edged on page 9 of the SCAT Manual with the following state- 
ment; "Since the comparative xisefulness and accuracy of tests like 
those in the SCAT series can be demonstrated in terms of concur- 
rent and predictive validity data — using criteria, which ate varied, 
objective, and specific — there is no real need to investigate their 
construct validity. Studies of construct validity by interested users 
of the tests will be welcomed by the publisher as sources of intec- 
esting information about the tests, but no such studies will be un- 
dertaken to prove the worth of the instruments,” 

Considering the nature of construct validity, the publisher’s ar- 
guments appear reasonable. Since the test is designed to replace the 
American Council on Education Psychological Test, however, it 
would be of interest to know how the two correlate. Such com- 



187 


The Use of Standards in Test Selection 

parisons would fall within the scope of construct validity. No such 
comparisons are presented. 

D. RELIABILITY 


TECHNICAL RECOMMENDATION 

Dl. The test manual should report such evidence of reliability as 
would permit the reader to judge whether scores are sufficiently de- 
pendable for the recommended uses of the test. If any of the necessary 
evidence has not been collected, the absence of such information 
should be noted. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The Manual of the SG^T is unusually thorough in this respect, 
reporting reliability estimates coefficients of internal consistency 
and coefficients of equivalence, with a good deal of empirical evi* 
dence. A careful statement is made that no coefficients of stability 
are available at the time of printing, but that such studies are under 
way and will be reported at a later date in supplements to the 
Manual. 


TECHNICAL RECOMMENDATION 

Dl.l Recommendation Dl applies to every score, subscore, or combi- 
nation of scores whose interpretation is suggested. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

A Special note is given in boldface type on page 11 of the Man- 
ual of the SCAT, It warns the test user not to forget that the 
scores of individuals on the four subtests should NOT be inter- 
preted separately for individuals. The profile sheet does not allow 
room for the recording of these part scores to emphasize further 
the need to avoid this common error. 



Measuremenf for Guidance 

TECHNICAL RECOMMENDATION 


Dl.2 If differences between scores are to be interpreted or if the 
plotting of a profile is suggested, the manual should report the relia- 
bility of differences between, scores. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The special construction of the test score profile sheet for the 
SCAT features a "confidence interval," which is portrayed by 
shaded lines so as to make it distinctive to the viewer. The follow- 
ing statement is made both in the Manual (page 11) and on the 
test profile sheet; "Use of the confidenjce interval makes it possible 
to note educationally important differences between the two scores 
(verbal and quantitative) at a glance. If the confidence intervals of 
the verbal and quantitative scores of an individual overlap, there is 
probably no educationally significant difference between the two 
scores and for intetpretative purposes you can regard them as being 
equal. If the confidence intervals around the verbal and quantita- 
tive scores of an individual do NOT overlap, the chances are about 
5'to-l that an. educationally important diffetence exists between 
the two scores." 


TECHNICAL RECOMMENDATION 

Dl.5 Reports of reliability studies should ordinarily be expressed in 
terms of: (a) the product-moment correlation coefficient; (b) another 
standard measure of sel^ionship suitable to categorical judgments; or 
(c) the standard error of measurement. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Both coefficients and error estimates are reported b the SCAT 
Manual. The reliability estimates for all subtests, the two major 
parts (verbal and quantitative) and total scores are all reported. 



189 


The Use of Standards in Test Selection 

The Kuder-Richardson reliability coefficients are in the range of 
.82 and .88 for the part scores, and the Manual warns accordingly 
that these are not sufficiently high for use in individual test score 
interpretations. In a table reporting this information the standard 
error of measurement of each subtest, part, and total scores is also 
reported. This is given in raw score points and ranges from 2.0 for 
some of the subtests to 4.3 raw score points for the total score. This 
information is provided for both the high school "School Ability 
Test. Form 2A‘' and the college level "College Ability Test. Form 
10.” A sample of 370 cases was used with each reliability study. 
The total score reliability estimate is .95, which is regarded, quite 
properly, as being high enough to use with an individual case. It 
is in individual cases that counselors must seek some degree of 
confidence that the present score represents a close approximation 
of his counselee's true score. 

TECHNICAL RECOMMENDATION 

D2. The manual should avoid any implication that reliability meas* 
uxes demonstrate the predictive or concurrent validity of the tests. 
ESSENTIAL, 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The section on reliability in the Manual of this test is introduced 
almost word for word with the general presentation of the topic 
of reliability in the Technical Recommendations. After this general 
discussion, various types of reliability information are presented. 
Koibin^ is p/e.seored in this aection that rn^ht lead the test user 
astray. 


TECHNICAL RECOMMENDATION 

D3. la reports of reliability, procedures and samples should be 
desaibed sufficiently for the reader to judge whether the evidence 



170 Measuremetii for Guidance 

applies to the persons and problems with wliidi lie is concerned. 
ESSENTIAL 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Coefficients of internal consistency are reported as the best type 
of evidence of reliability of this test. Information is reported for 
both the high school form, SAT, and the college form, CAT. As 
an example, Form IC of the College Ability Test was given to (^4 
freshmen in 15 colleges, which arc listed in Appendix E of the 
Manual. No information is given as to how these 604 students 
were selected from among the 1,494 freshmen reported as attend- 
ing these colleges. A stratihed random sample was drawn propor- 
tional to the size of each of the 15 colleges, and 570 cases were 
selected from the 604 for reliability analysis. Comparisons of the 
total (N =s 604) population with the sample are made in terms of 
mean and standard deviation for the verbal, quantitative, survey, 
and total scores. Each of these scores and four other subtest scores 
are reported in terms of their Kudet-Richatdson reliabilities and 
their standard errors of measurement. Some question might be 
raised as to the selection of colleges. One would like to assume they 
were chosen for their representativeness rather than their availa- 
bility, but the absence of well-known, large institutions would raise 
some question as to the appropriateness of these data for general 
colleges everywhere. Average enrollment of these 15 colleges was 
463, and seven of the 15 were described as sectarian in their stated 
aims and affiliations.** Further reliability studies would greatly 
strengthen this section of the data. 

TECHNICAL recommendation 

D3.2 The reliability sample should be described in terms of any 
selective factors related to the variable being measured, usually in- 
Mary Iwm. American Coilega and VideertUtei. Washington, D.C; American 



The Use of Standords in Test Selection 191 

eluding age, sex, and educational leveL Number of cases of each type 
should be reported. ESSENTIAI- 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Educational level is the only factor identified in the Manual or 
its appendixes. While sex and age data are probably not vital in 
tests of this sort, a more complete breakdown of the reliability 
analysis group to portray their identifying characteristics would be 
desirable. 


TECHNICAL RECOMMENDATION 

D3.5 Appropriate measures of central tendency and variability of the 
test scores of the reliability sample should be reported. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

This has been done faithfully as indicated in several of the sec* 
tions above. The actual means of the analysis sample are from 2 to 
4.2 raw score points higher than the population from which they 
are drawn, but since the SD’s ate approximately the same it can be 
assumed that this mean difference is within the limits of chance 
and therefore not statistically significant. 

EQUIVALENCE OF FORMS 

TECHNICAL RECOMMENDATION 

D4. If two forms of a test are made available, with both forms in- 
tended for possible use with the same subjects, the correlation be- 
tween forms and information as to the equivalence of scores on the 
tsvo forms should be reported. If the necessary evidence is not pro- 
vided, the manual should warn the reader against assuming com- 
parability. ESSEhJTIAL. - 



The Use of Standards in Test Selection 


193 


ilANOAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The coefficient of internal consistency is the type of eelisbility 
measure chosen by the authors of this test to describe the reliability 
of the tests in the SGVT series. The only actual coefficients of 
reliability reported are of this kind, 

TECHNICAL RECOMMENDATION 

D5.1 When a test consists of separately scored parts or sections, the 
correlation betn’cen the parts or sections should be reported. ESSEN- 
TIAL. 


MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

This is presented in a table for each of the two major levels of 
the SCAT tests, the college CAT and the high school SAT. Inter- 
correlations are presented for only one form of each of these two 
levels. The interccrrehtions are rather high (.53 to .62) between 
the verbal and quantitative scores. This would minimize the use of 
these subtests for diagnostic or predictive purposes. This point is 
reinforced in the Manual several times. 

TECHNICIAL RECOMMENDATION 

D5.11 If the manual reports the correlation between a subtest and a 
total score, it should point out that part of this correlation is an arti- 
fact. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

No discussion of the data presented in the table of intercorrela- 
tions is given. The reader is left to draw his own conclusions. The 
intercorrelations between the subtests and the total score are high. 



^94 Measurement for Guidance 

.80 to -89, which aie as high as coefficients often repotted for re- 
liability estimates in many other tests on the market. This should 
again warn the counselor that the most useful score for this 
measure is the total score. 

technical recommendation 

D6. Coefficients of internal consistency should be determined by the 
split-half method or methods of the Kuder-Richardson type, if these 
can properly be used on the data under examination. Any other 
measure of internal consistency which the author wishes to report in 
addition should be carefully explained. ESSENTlAl^ 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Kuder-Richaidson (Formula 20) was used in computing each 
reported coefficient of internal corrsisiency. The basic assumptions 
in the use of this specialized formula must be met and clearly 
stated or there are apt to be some distortions In the values obtained. 
Of the needed qualiffcations for use of this formula, only one, the 
effect of speeded tests upon it, has been adequately stated in the 
Manual. (See D6.1 below.) The effect of heterogeneity of item 
content is to yield higher reliability estimates than might be gained 
otherwise. No mention is made of Ihb possibility in the Manual. 

TECHNICAL RECOMMENDATION 

D6.1 For time-limit tests, split-half ot analysis of variance coefficients 
should never be reported unless: (a) the manual also reports evi- 
dence that speed of work has negligible inffuence on scores; or (b) 
the coefficient is based on the correlation between parts administered 
under separate time limits, ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITV TESTS 
The Manual of the SCAT provides a detailed discussion of the 



195 


The Use of Standards in Test Selection 

greater desirability of power over speeded tests in the measure- 
ment of educational ability. A table is presented that itemizes the 
percents of students in the norm analysis groups w’ho completed 
100 percent and 75 percent of each of the items as well as an actual 
item count of the number of items in each subtest reached by 80 
percent of the sample. All these hgures are high enough to suggest 
that power rather than speed is essentially being tapped in the 
SCAT series. As might be expected, slightly more stress is placed 
on speed in the arithmetic subtests than in the verbal tests. 

STABILITY 


TECHNICAL RECOMMENDATION 

D7. The nuoual should indicate what degree of stability of scores 
may be expected if a test is repeated after time has elapsed. If such 
evidence is not presented, the absence of information regarding sta- 
bility should be noted. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The Manual clearly states that while no stability data are availa- 
ble at time of publication, studies are under way that will be re- 
ported later. This information will be watched for with a good 
deal of anticipation since the stated purposes of the SCAT series 
is the continuity of measurement over a long range of years. If this 
effort is successful, the Educational Testing Service will have filled 
in a real gap in the field of testing. They will also have provided a 
real tool for the counselor who is, above anything else, interested 
in growth and prediction of future success. 

E. ADMINISTRATION AND SCORING 

TECHNICAL RECOMMENDATION 

El. The for administrsUion should be presented with suf- 



196 Measurement for Guidance 

ficient clarity that the test user can duplicate the administrative condi- 
tions under which the norms and data on reUability and validity were 
obtained. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABIUTT TESTS 

A more than average effort is made by the authors o£ the SCAT 
series to present the directions for administcatioa in a clear and 
easily followed maimer. A preliminary kind of check list is pre- 
sented to alert the examiner to have the necessary forms and ma- 
terials on hand. Some attention is also paid to motivation for 
test-taking, and care in making proper physical arrangements be- 
forehand. Specific time scheduling is suggested with several alter- 
nate plans offered, but no data are given about the effects on scores 
by the alterations. Attention is given to some of the everyday prob- 
lems in test administration that are forever plaguing the novice, 
such as what to do when testees ask numerous specific questions. 
Proper stress is placed upon the need for uniform and standardized 
presentation of the test directions and materials. 

The actual directions that are to be read are printed in ted ink, 
alternated with cautions to the test administrator printed in the 
usual black. The red ink alternating with black is also used in the 
test booklet itself in an effort to inaease clarity of procedure. 

P. SCALES AND NORMS 

TECHNICAL RECOMMENDATION 

El. Scales used for reporting scores should be such as to increase 
the likelihood of accurate interpretation and emphasis by test in- 
terpreter and subject. ESSENTIAL. 

>UNUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The SCAT scries reports its scores as normalized scaled scores 



The Use of Standards in Test Selection 197 

equated by use of a highly specialized technique, "Lord’s maxi- 
mum likelihood method." The reader of the Manual is left a little 
in the dark as to just how this equating process is done. The num- 
ber and characteristics of students used to arrive at this score ex- 
pression are not described. If the reader accepts this scaled score 
and its derivation he will find that the Manual claims for it: (l) 
this scale is not like most normative scales, but is a "test-defined 
scale” with an unique scale for each ability sample; (2) a particu- 
lar scale describes the same ability regardless of form or level used, 
thus allowing comparisons between these forms and levels over a 
period of time; and (3) the scote has no interpretative value in 
itself, but must be understood in reference to a table of norms. 
Should all these prove out over the years, the SCAT tests will have 
made an unique contribution in their scoring system. 

TECHNICAL RECOMMENDATION 

F4. local norms are more important for many uses of tests than pub- 
lished norms. In such cases the manual should suggest appropriate 
emphasis on local norms. VERY DESIRABLE. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

Very adequate emphasis is placed upon the desirability of estab- 
lishing local norms. The school counselor who uses the SCAT 
should develop local class, school, and system-wide norms. Score 
distribution forms that may be used to facilitate this calculation 
come with the tests. Lengthy discussion of the need for and the use 
of local norms for counseling the individual student is presented 
in the Manual. Warning is given that "test scores and class 
averages should NEVER be used as administrative or supervisory 
'clubs’ to be held over the heads of teachers.” Teachers and counse- 
lors are given several suggestions as to how best to utilize local 
normative information in conjunction with the published norms in 
an attempt to increase the usefulness of the scores. 



198 


Measurement for Guidonce 

technical RIXOMMENDATION 


F5. Except where primary use of a test is to compare individuals with 
theit own local group, norms should be published at the time of 
release of the test for operational use, ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABIUTY TESTS 

Tables of norms are presented in Appendix B of the SG^T 
Manual. These are grade norms, and range from Grade 10 to 
Grade 14. Separate norm tables ate given for each subscore and the 
total score. 


TECHNICAL RECOMMENDATION 

F7. Norms should refer to defined and clearly described populations. 
These populations should be the groups to whom the users of the test 
will ordinarily wish to compare the persons tested. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

The Manual of the SCAT series describes its normative groups 
by naming the schools and colleges from which the scores used in 
norming were derived. In addition they give the size of the group, 
state the mean scores and the standard deviations, and identify 
each school by its geographic location. In this manner 35 secondary 
schools and 15 colleges were identified as being the cooperating 
schools in the collection of the normative data. 

Only when the counselor begins to inspect these data a bit more 
carefully does he realize that one weakness of the SCAT at the 
time of this evaluation lies in its normative data. While recognition 
needs be given to the fact that the test series is a new one, and that 
the Manual does state that mote data will be forthcoming in the 
future, both the size and the composition of the normative group 



The Use of Standards in Test Selection 199 

bear some close inspection. A full discussion of the basis of sam- 
pling for the higli schools involved in the normative sampling is 
given in the Manual. This includes a discussion of the criteria for 
eligibility established in the selection of the school, as well as a 
lengthy table that affords the test user a breakdown of the norma- 
tive sample by region and size of community, by Grades 10, 11, 
and 12 for the high school level of the SAT (high school) test. 
The comparative population figures were taken from the biennial 
report of the U.S. Office of Education. Although size of the norma- 
tive sample might be increased at least tenfold, the high school 
norming procedures seem generally adequate. 

Much less information is given for the college norms. Not only 
are the normative figures much smaller in size, but one might sus- 
pect some bias in the sampling when the characteristics of the 13 
colleges used in the.iampling ate inspeaed. As nearly as can be de- 
termined, more than one third of *e college students are from 
parochial enrollments in colleges limited to women. At least five 
of the colleges are so small that they do not appear in standard 
reference works on college characteristics. None of the colleges has 
an enrollment of mote than 805, and the average size is only 436. 
One might suspect some differences in characteristics from a 
broader and more representative sampling of American collegiate 
youth than those presented in the norm tables. If no such differ- 
ences are to be expected, it would seem to be the responsibility of 
the test publishers to make that point clear. 

TECHNICAL RECOMMENDATION 

F7.1 The manual should report the method of sampling within the 
population, and should discuss any probable bias within the sample. 
ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 


According to the Manual of the SCAT series, somewhat different 



200 Meosurcmcnt for Guidance 

sampimg pioccduics vicic followed foe the School Ability Test 
tlun for tlie College AbUity Test. In llic higli school sample Uic 
population was defined as a random sample, stratified as to scliool 
size drawn from schools in a nation-wide surs'cy of secondary 
school characteristics. Over 1,800 schools were contacted and ap- 
proximately 50 percent replied to queries about tiicir characteristics. 
From this last group, 33 schools were selected on the basis of a 
combination of geographic representativeness and school size. Just 
how rnany of the 900 schools that replied to tire queries about tlrcir 
school were willing to participate in die normative study is not 
clear, not is the reader given any information as to why only 35 
schools were used in the final norms. 

The college test population was chosen so as to obtain colleges 
as nearly as possible like those used in the normative group of the 
ACE Psychological Examination for College Preshmen. No data 
are presented about how many schools were considered before die 
final 1 5 colleges were selected. The college normvng sample is com- 
pared with the ACE normlng sample, but it is done by percentage 
comparisons of colleges. With a toui group of only 15 colleges, a 
pcicentage is apt to be somewhat roislcadingl A reported 50 per- 
cent would really be less than eight colleges. 

TECHNICAL RECOMMENDATION 

F7.2 The aiunbec of eases on which the norms arc based should be 

reported. ESSENTIAL. 


Manual of the school and college ability tests 


The SCAT Manual does this reporting, stating that the total 
normative group consists of 11,829 cases distributed as follows; 


Grade 10 
Grade 11 
Grade 12 


3,748 Gillege freshmen 1,494 

3,038 College sophomores 953 

2,59(5 



The Use of Standards in Test Selection 201 

This information is again repotted at the bottom of each norm 
table. 


TECHNICAL RECOMMENDATION 

F7.3 The manual should report whether scores differ for groups 
differing in age, sex, amount of training, and other equally important 
variables. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

No mention is made of sex norms, age norms, or any category 
other than grade. While this is common practice in tests of this 
kind, a statement to the effect that acpected group differences were 
small would be reassuring to the test user. Of particular importance 
to the college counselor would be a clear statement about the ad- 
mission policies of the colleges used in the normative study and 
the degree of selectivity they practiced in acquiring their students. 
Only one of the colleges in the normative group is a state-sup- 
ported, coeducational institution of the kind where often the only 
entrance requirement is high school graduation. If highly selective 
entrance requirements were to hold in the other colleges reported 
in the norms, some questions could be raised about the appropriate- 
ness of the norm tables for use with students in colleges with dif- 
ferent admission policies. 

TECHNICAL RECOMMENDATION 

F7.7 The conditions under which aoroiath'e data were obtained 
should be reported. The conditions of testing, including the purpose 
of the subjects in taking the te^ should be reported. ESSENTIAL. 

MANUAL OF THE SCHOOL AND COLLEGE ABILITY TESTS 

No mention is made of this point in the Manual of the SCAT. 
The argument is sometimes made that error factors ifftroduced by 



202 Mcotureracn* for Guidance 

such influences would lend to cancel themselves out in the Ions 
run. Wide experience with high school youth by the authors would 
seem to indicate the importance of Irnowing something about the 
conditions of the testing situation. Tlsc Manual, for instance, 
recommends that the tests be administered in a single session of 
too minutes. It suggests also, however, that tlie lest may be ad- 
ministered in two separate sessions, one in the morning and the 
Other in the afternoon of the same Jay. It would be interesting to 
know ^^hether one plan or the other was used in all the schools 
participating in the norming program and, if not, whether the two 
approaches produced difTercnt results. 

SUMMARY 

In this chapter, the reader has been introduced to the Technical 
Recommendations for Psydiological Tests and Diagnostic Tecli' 
niques, and the importance of this document has been discussed. 
The document represents an important step toward the future im' 
provement of tests. Us influence can already be noted in the extent 
to which one publisher has attempted to follow the recommenda* 
tions in the design, contents, and technical data of one of its new 
tests. While the SGVT is not a perfect instrument and docs not 
meet all the pertinent standards, many of its weaknesses and 
limitations are acknowledged In the test Manual. This, it seems 
certain, is due to publisher’s attempts to follow die standards pre- 
sented in the Technical Recommendations. The potential user of 
the test is provided, in general, with the type of information upon 
which to base a decision as to whether or not it meets his needs 
and, if selected for use, the limits placed on such use. U is hoped 
that other publishers will similarly model their future tesu around 
the recommendation as they exist at present and as they may be 
modified in the future. WbUe the Technical Recommendations 
may do much in improving the content of test manuals and tests 



The Use of Standards in Test Selection 203 

themselves, the counselor will need to continue his critical appraisal 

of tests, for the ultimate responsibility for their use is still his. 

Discussion Questions and Exercises 

1. It is held by some authorities on measurement that "standards” 
inhibit experimentation and innovation. Do you agree or disagree 
with this position? What arguments can be made for each side? 

2. From reading previous chapters of this book and reviewing a copy 
of the Technical Recorrunendations, are there any aspects of testing 
that you believe are not adequately covered by the standards? 
What additions would you recommend? 

3. After reviewing a copy of the Technical Recommendations, evalu- 
ate those standards categorized as eery desirable and desirable. 
Would you, in principle, have placed them in these categories? 
What loss, if any, is silvered if a test manual omits discussion of 
these standards? 

4. Using a copy of the Technical Recommendations as a guide, 
analyze the manual of a test publbhed before 19S4. To what 
degree is the manual deficient in terms of the recommendations? 
Is the treatment of some areas better than others? How would you 
account for the difference in treatment? 

References * 

American Educarional Research Association. Technical Recommenda- 
tions for Achievement Tests. Washington, D.C., American Educa- 
tional Research Association, 1955. 

American Psychological Association. "Technical Recommendations 
for Psychological Tests and Diagnostic Techniques.” Supplement 
to Psychological Bulletin, March, 1954, 51. Washington, D.C., 
American Psychological Assodarion, 1954. 

Educational Testing Service. Examiner’s Alanual, School and College 
Ability Tests. Princeton, NJ. Cooperative Test Division, Educa- 
tional TestingServlce, 1955. 



CHAPTER VI 


Recording and Reporting Test Scores 


Some sort of cumulative record of a student’s progress has 
become almost a sine qua non in educational institutions and an 
essential part of that record is a section devoted to test scores. The 
test section can become a sterile, uninformative, even confusing 
addition to the records or it can asntain vital, revealing items of 
significance foe use io the counseling process. If the tests have been 
carefully selected, they will be made roost useful if the scores de* 
lived from them ate recorded clearly, cumulated effectively, and 
made available for ready interpretation by those who are to use 
them. 'This chapter will be devoted to a discussion of how that can 
be accomplished. 

At the time a test is given it is not always possible to predict at 
what later time and by whom the sosres will be used. Various 
persons may find it necessary to look at a pattern of test scores 
achieved by a student over a long period of time. Scores obtained 
early in a student's career roust be avaUable in such form that com- 
parisons of them with later scores can be done readily by counsel- 
ees, parents, teachers and other sdiool personnel, employers actual 
or potential, admissions officers of higher education institutions 
and, occasionally, persons who have such special assignments as 
204 


Recording and Reporting Test Scores 205 

selecting among candidates for scholarships. Serving the purposes 
of persons who are likely to have widely differing knowledge about 
tests requires that several difficult decisions be made about the 
recording and interpretation of test sa3tes. 

METHODS OF RECORDINO 

Test results are commonly recorded in tabular or graphic form 
but sometimes they are presented in paragraphs. An informal poll 
of the preferences of school persotmel for these three methods was 
conducted by the Records and Reports Co mmi ttee during the Eight 
Year Study of the Progressive Education Association.' They were 
almost unanimous in their selection of the tabular method. At a 
later time they were given the choice between the two forms shown 
in Figure 2 for recording test results and again the vote was over- 
whelmingly in favor of the first. 

It seems, then, that it will be advisable to record the test scores 
in tabular form but the question of what will be reported in the 
tables must be considered. The full name of the test (or an abbre- 
viation that cannot be misinterpreted) and the form of the test 
must, of course, be recorded. The date of administration will be im- 
portant but it will not usually be necessary to indicate more than 
the month or year if the subject’s school grade is on the cumulative 
record. The common practice of recording raw scores is unnecessary 
if the derived score is given and description of the norm group is 
clearly stated, but when local norms ate used it may be necessary to 
report the raw score. Thus, in the example in Figure 2, it might be 
desirable to eliminate the raw score column and substitute one in 
which the norm basis could be noted, or to add another column so 
that both raw scores and norm basis could be indicated. 

In making decisions about which form of derived score to use, 
several factors must be considered. The counselor must be con- 
cerned primarily, however, with the problem of interpretability to 

1 E. R. Smith and R. W. Tyler. Appraising and Raeording Student Progress. New 
Yo/k: Harper tt Bros., 1942. 



allllllllli 

iiii iiii ilaniiam 

liiiiiillaeiuinn 

mnii 
Hill 
„„Jiiii 

iinanii 



Figure 2. 





Recording and Reporting Test Scores 207 

counselees and the mmimi 2 ing of possible errors of interpretation 
by those persons who may have occasion to use the data. 

The Psychological Corporation has provided an excellent dia- 
gram (see Fig. 3) of methods of expressing test scores. It presents 



r-ieam 10 30 SO M ?0 80 


CEtt K«itt 200 300 iOO 300 iOO 700 800 

ACCTtcer*! 40 60 80 100 120 140 160 

I I I I I I I I I 

Sion, no I I i I J I * I > I « I ' I ■ I » 

P»r»r>l in 9lan.n< | 4% | 7X 131 I7X 20t I7t 12X 7X I 4X I I 

Wechsler Scale* 1 ! ! ! ! I I I 

Subtesii 1 * 7 10 13 16 16 

D»r,al-»nlO, 53 70 83 100 IIS 130 1*3 

Figure 3. 

all the common methods of reporting scores and adds others such 
as the "Wechsler Scale and Deviation IQ methods, which may be 
used occasionally. 

The difficulties in interpretability of a set of test scores for the 
several persons who are to use them will not be lessened if several 
different methods of recording them are used. It seems desirable to 
choose one form and retain it throughout a student’s record. When 
an occasional deviation from that form, as in the case of the 




208 McoJuremeot for Guidance 

Wcchsicr Sales, is used, it will be desirable to supplement the 

record by an explanatory paiagiaph. 

Recording of derived scores in terms of percentiles seems likely 
to be most meaningful to counselccs and other persons u ho arc not 
familiar with csotciic terms used in measurement. Percentiles have 
the disadvantage, as shown m Figure 3, that they arc not equal 
points on a sale and therefore a score that is at the 95th is farther 
away from the 85Ut percentile than a score at llie 55th is from one 
at the 45lh percentile. A further disadvantage is that percentiles 
may be confused with percentages. Both of these dlfiicultics and 
sources of confusion are easier to clar up to a counsclcc than arc 
the problems that arise when attempts arc made to explain what is 
involved in any of the forms of standard scores. 

It has been found that a vast majority of a sample of test users 
prefer a percentile equivalent for every possible raw score.* Using 
such percentile tables, it U possible to slsow a counselce where he 
starrds in relation to some defined group and to show him that the 
differences at the ends of the scale are greater than those at the 
middle.* The tables can also be used to show the diffetenccs 
between percentile and percentage. 

When the test data have been recorded by percentiles in tabular 
form on the cumulative record (and there seems good reason to 
belio'c that even the intelligence test scores would be recorded in 
percentiles) the task of recording is not complete. The tester 
should record any peculiar conditions in the test situation, hiS ob* 
servations of students ractions, or any facts about a student’s 
general characteristics that might have affected his scores. Thus, if 
it is known that a student is meticulous in checking es’cry bit of 
his work and is thus likely to be handiapped in speed tests, that 
fact should be recorded. If a student is known to be exceptionally 


^ Corroraliaw 

T ^ Interpreung Test Scores to Couaselees." Oecu- 

potions, rebnury, 1952, pp, J20-322. 



Recording and Reporiing Test Scores 209 

nervous about all types of objective tests, a note to that effect 
should be added. If still another is known to be particularly dis- 
turbed by certain kinds of tests, such as those that require the use 
of numbers, a notation about the circumstance should be made in 
the section devoted to test interpretation. The following samples 
abstracted from some student cumulative folders illustrate such 
comments. 

This boy finds all tests so easy that be usually hurries through 
them, and spends the rest of the testing period watching his fellow 
students. 

Jane is so flustered at ait objective testing sessions that she cannot 
seem to get started until the other students have finished a good part 
. of the test. 

Jim goes through a test quickly, doing the items that he knows he 
can do first. He then goes back to the more difficult items. 

Dian said that she just could not do arithmetic and that she would 
never, even if other phases of it were excellent, take a job where she 
would be required to make change for customers. 

Clark can never shed bis scorn for things academic long enough to 
put forth ma-ifim iim effort OQ tests so that all his scores are questiona- 
ble. 

Larry’s scores always seem lower than one would expect from him. 
His effort was not always maximum even when test situations are 
caref ully proctored. 

As indicated in Chapter VII, many factors within and without 
the individual may influence his performance on a particular test. 
If those factors are of such strong influence that the test score must 
be seriously questioned, it will often be better to omit it from the 
test record completely and to note on the record the reasons for 
doing so. 



210 


Measurement for Guidance 


PROFILES 

■When a stadent’s test scores have been recorded graphically, the 
result is sometimes described as a profile or psychograph. The 
object o£ the test profile is to set forth test results in the simplest 
and clearest form for the user of test results. It may be a line dia- 
gram that indicates the relative position of divergent standing on 
various tests as •well as the overall picture of his performances. It 
may be a curve joining the successive points op a graph represent- 
ing individual status in eadi of several traits. In such cases it is a 
method of recording status of an individual in each of several traits 
so that the geometric pattern aeated may be meaningful. These 
profiles are often used uncritically despite occasional reminders by 
measurement experts that there are many pitfalls in their use. Such 
factors as independence of the variables, inequality in scales, relia* 
bliity of subtest scores, and the relauve importance of the variables 
must be considered if profiles are to be used. 

Since profiles ate merely graphic representations of -what may be 
fallible test scores, they must be interpreted in terms of the varybg 
reliabilities of the tests and in terms of the intcrcorrelations among 
the tests represented. If test results are to be used in profile form, 
it is essential that the scores of all the tests be reduced to a common 
origin and a common unit of measurement. Psychologists and 
educators have rarely been able to devise compensating scales in 
which units on one scale represent equal amounts of the properties 
being measured by another. 

At this point the student should look carefully and critically at 
the profiles obtained from such tests as the California Test of 
Mental Maturity, The Stanford Achievement Test, and the Dif- 
ferential Aptitude Tests, and at the profiles obtained from such 
questionnaires as the Kuder Preference Record or the Minnesota 
Multiphasic Inventory. He should be sure to examine the data in 
the test manual about the standardization populations, the Inter- 



211 


Recording and Reporting Test Scores 

correlation of subtest scores, the reliability of the subtests, and the 
evidence (if he can find any) that the subsections are of somewhat 
comparable validity and importance. If the subtests were standard- 
ized on populations differing in quality or number, if the inter- 
correlation coefficients are high, if the reliabilities vary greatly, and 
if the evidence of comparable validity and importance is incom- 
plete, the value of the profile must be questioned. If the answers 
to the questions. What are the important factors to be measured? 
How can really comparable measurements be made of those fac- 
tors? are not available, it will probably be desirable to omit the use 
of profiles. 

Much of what has been said about profiles applies equally well, 
of course, to a column of test scores. The same precautions in 
interpretation are necessary when scores are in the usual tabular 
form, but the profile causes more difficulty because sharp peaks 
and hollows indicated by the (almost meaningless) lines of the 
profile are likely to stand out and seem to mean more to the coun- 
selee or other untrained observer than they should. The elation at 
the sight of a peak on a profile or the despondency at a sharp 
decline of a curve is likely to leave lasting impressions on a coun- 
selee or his parents. The interpretation that they draw from such a 
profile may be out of all proportion to its importance despite the 
precautions advanced by a counselor. 

WRITTEN REPORTS 

As suggested previously, scores from a test without some verbal 
interpretation are likely to be misinterpreted and those previously 
posted on a record may be misleading. In the case of tests that are 
administered individually, the recording of a score without any 
comment may result in the loss of many of the observational data 
that may be as important as the test score. To avoid such loss, a 
paragraph statement should be written. The following ^atements 
are sampl es of the kind that niaybeus_ed. ^ 



212 


Measurement for Guidance 

John’s IQ score of 128 places him among the upper ten per- 
cent of the population in the type of performance measured by 
the tests. He tried very hard and his anxiety to succeed was 
shown by his repeated questions concecnbg the correctness of 
his answers. He did exceedingly well on all the items that re- 
quired use of figures and nonverbal material but performed only 
at average levels for his age on verbal material. He was an alert, 
interesting boy who spoke very freely about himself and his 
many activities. He is left-handed. Frequent eye-blinking sug- 
gests the need of ocular care or investigation concerning situa- 
tions that provide too great emotional stimulation. 

Jane’s IQ score places her inUie lower five percentof the pop- 
ulation in the type of performance measured by this test. She 
tried hard and failed often but did not seem to care whether she 
succeeded in passing tests. She failed to respond to praise and 
encouragement and seemed to be very glad to finish with the 
testing. Her attention wandered frequently and many responses 
seemed to be unrelated to the tasks she was required to do. She 
asked foe many repetitions of directions and there Ls the possi- 
bility of a hearing difficulty that ought to be investigated. She has 
a decided squint We suggest that she be given a thorough 
investigation to determine whether classification of mental defi- 
ciency can be made. 

Paul’s IQ score of 135 places him in the upper 2 percent of 
the population in the type of performance measured by this test. 

He did many tests four years beyond what is expected of a boy 
of his age and he did all kinds of tests on this scale with equal 
facility. He took the test as a challenging game and we feel very 
sure that this record describes his best performance. The be- 
havior problems that this boy exhibits in school might well be 
due to the fact ti«t fifth g^ade is not challenging enough 
for him. We would like to give him several other tests and to 
e^^ine other information about him to determine the advisa- 
bility of promoting him to the junior high school. 

Two difficulUes with the use of paragraph reports are the time 



Recording and Reporting Test Scores 213 

that is required to write them and their subjective nature. If it has 
been worth taking an hour of a student and counselor's time to 
have the test administered, ft is probably worth the extra few 
minutes it takes to write the report. The statements in the report 
must, however, be considered only as tlwse of a particular observer 
in a specific situation. Generalizations about a counselee that are 
based solely on the paragraph reports must not be made. If they 
corroborate other evidence that has been obtained previously, that 
corroboration may make the evidence more useful. If they do not, 
further investigation will be needed to determine whether or not 
the subject’s performance in test situations is really sigoificantj 
enough evidence of variability from usual performances to warrant 
consideration in planning with him. In either case the results may 
be useful in his counseling. 

CUMULATIVB RECORDING OF SCORES 

Some of the problems of recording and making available the test 
data of an individual would be serious enough if they referred to 
the scores obtained in only one year. The problems become vastly 
more complicated when longitudinal data on test performances are 
obtained. The interpretation of a set of test scores currently ob- 
tained is fraught with difficulties, but tbey seem to be minor com- 
pared to those that one meets when attempts are made to interpret 
a series of scores obtained over a period of years. It is difficult to 
determine whether a counselee has grown in the characteristics that 
the tester has attempted to measure because it is almost impossible 
to get comparable units of measurement 

In the area of personality assessment, the problem of getting 
meaningful units of measurement seems impossible of solution at 
the present time. There is nothing in that area that even approxi- 
mates the metrics of difficulty that can be used with fair success in 
the fields of achievement and intelligence testing. In the latter areas 
it is assumed that the greater number of questions one can do. the 



214 Measurement for Guidance 

better the score will be; but that assumption cannot be justified in 
personality roeasureraent. Is it better, for example, to be 100 per- 
cent responsible, exliDvcrled, dominant, creative, free from use of 
defense mechanisms than it is to be partially so? Is the golden 
mean better than the extremes? Since in this area the best is not 
certainly the most, those who would assess personality cannot use 
the metrics of frequency and difficulty that the achievement tester 
uses. At this time no satisfactory substitute has yet been found. 

In the fields of achievement and aptitude testing, a common 
practice is to allot certain values to correct answers to questions on 
a test (usually one point) , to count them, and to conclude that the 
highest total is the best score. Still another method is to give test 
items to different age or grade groups and, by comparing their 
scores, set up age or grade norms, ft is then assumed that the high- 
est scores made by the older subjects or higher grade groups are 
evidence of mental development or growth in achievement. In the 
process, assumptions of content and comparability of metrics are 
made. If such assumptions were always tenable, the interpretation 
of cumulative test records of an individual would be less difficult 
than it now is. Test-nukets have seldom given test users enough 
data to assure them that scores obtained on the same subjects over 
a period of years ate comparable.* They caimot be sure that equal 
increments in test scores arc associated with equal increments in 
the characteristics that ate presumed to be measured by the test. 
So far, the best that test-makers can do is to compare a subject's 
performance to those of a defined group in terms of the distribu- 
tion of scores of the group and to repeat that comparison with a 
later group. The units of measurement employed in the compari- 
sons are percentiles, age, grade scores, or one of the several forms 
of standard scores variously labeled as T-scores, scaled scores, 
stanines, and z-scores. They are illustrated in Figure 3. No one of 
them avoids the difficulty that they do not make equal units of 


* The efforts of the Cooperative Test Diviskia of the 
tlus ditecUon have been paitscutaity praisewotUiy. 


Educational Testing Service in 



215 


Recording and Reporting Test Scores 

measurement except by definitiOQ. Tiedman points out, however, 
that this procedure is not unique to educational measurement in 
these words; "However, description is possible through use of de- 
fined units of measurement. Measurem«it of such concepts as tem- 
perature and speed has proceeded by means of defined units. The 
practice of defining units of measurement may be defended as 
valid as long as the results of experiments involving the units as 
defined prove useful and understandable; that is, as long as the 
results are consistent with expectations for them, were equal incre- 
ments of normalized scores associated in reality with equal in- 
crements of the characteristics measured." * 

After consideration of the values and limitations of several 
methods used in attempts to get equal units of measurement Tiede- 
man concludes that in terms of comparability the methods may be 
ranked as follows: (1) K-scores, (2) scaled scores, (3) T-scores^ 
(4) age and grade scores, and (5) percentile ranks.* 

Tiedeman has ranked percentiles as lowest in terms of compara- 
bility. It will be noted that the writers have placed them first with 
respect to their interpretability to counseJees and other nonteobni- 
cally trained school personnel or parents. If the claim of highest 
ranking for percentiles in interpretability is valid and Ticdeman's 
claim of lowest ranking for them in comparability (and hence as 
poor measure of growth) ate both justified, the counselor who 
wants to put scores on cumulative records must find himself in a 
dilemma. If he reports scores in terms of percentiles for better cur- 
rent interpretation to his counselees, he will be less sure of his 
data when he is called upon to answer questions about whether the 
student has been developing well, normally, or poorly. And it has 
been suggested frequently that longitudinal measures of develop- 
ment will be better predictors of future development than a cross- 
sectional picture obtained from one set of scores. 

* D. V. Tiedman, "Has He Grami?" Test Seieitce HoUbi>ok, No. J2. Voake/^ 
N.y.: World Book Co. 

* See references at the end of the chapter for descriptions of these methods. 



216 Measurement for Guidance 

Unfortunately, no crucial experiments have been carried out to 
help the counselor who recognizes the dilemma. He will be forced 
to rely on his own judgment. He may recognize that all methods 
have limitations and may sliU prefer to utilize percentiles in his 
interpretations to lay persons. He will also be aware that his judg- 
ments about growth must be tempered by his knowledge of the 
limitations of the technirjue. If he wishes to make studies in growth 
he whl work out the more technical units of measurement from his 
raw scores, but the entering of such scores on the cumulative record 
is likely to result in more confusion than clarity to most users of 
the record. 

PREDICTION AND TRANSFER 

Much of what has been said previously in this chapter applies to 
the handling of test scores when they are to be sent on to schools 
and colleges or to potential employers. Except in rare cases, the 
form in which data are to be recorded when the student transfers 
is not speci&ed and the counselor has a choice of methods by which 
he can meet his responsibility of making the scores meaningful. 

Unless the counselor has evidence obtained from expectancy 
I i tables o r has, from some other source, become thorou^y familiar 
with the situation into which the counselec proposes to enter, he 
should not make predictions about probable success in it. It is the 
responsibility of an admissions officer or employer to make the 
decision as to whether an applicant will be accepted. If he accepts 
him, it is implied that he expects (predicts) successful work from 
the student or employee. If his prediction is wrong and the subject 
fails, he cannot then place the responsibility for that failure on the 
counselor who could not possibly know as well as the admissions 
officer or employer all the drcumstances that the subject may find. 

The counselor may, as a gmde to future action, make predictions 
of his own concerning the success of his counselees who go on to 
particular employment or training situations. From follow-up data 



Recording and Reporting Test Scores 217 

he can check his predictions and use the results to help future 
counselees who plan to enter aich situations. He would very proba- 
bly meet serious difficulties, however, if he were to predict success 
of counselees in situations with which he was not thoroughly 
familiar. Only occasionally will he find an institute of higher 
education that has worked out tables about performances of 
entrants from which a candidate for admission can estimate his 
chances for success. It will be the responsibility of the counselor to 
inform the student about such tables when they are available. 

The figures that follow were derived from two sets of data ob- 
tained about each member of a large freshman class of one uni- 
versity and a study of their achievement during their first semester. 
They indicate the percentages of freshmen who achieved less than a 
1.0 grade point average (which meant that they would be dropped 
from the institution), of those who achieved better than a 1.0 
average, and those who gained greater than a 2.0 average (better 
than B grades) if their test scores lay within each of the four 
quartet of two sets of data. 

The counselor may discuss this table and the test scores with a 
student who is applying for admission to this institution but he 
should not predict that the potential student would achieve at the 
level of any of the grade-point categories. He should be certain to 
make clear what odds probabilities^ or chances of success were 
based upon the performances of a group who preceded him. They 
do not certainly, despite the headings in the table, indicate the 
chances of success of any particular student. The potential student 
SCSI'S! Aif* Av Abe tep .as-uime ihat he "Jm ii 

made.” Nor can the coiinselce who scores in the lower quarters 
assume that failure is certain. Both can probably profit from a dis- 
cussion of what others have done and what they can do about 
decisions to apply for admission and the level of work they must 
do if they are accepted. 

Unfortunately, there are not many educ&donsl institutions or 



sibl. «” S'”"'" S'"'’' “' '''“^ 




219 


Recording and Reporting Test Scores 

employers who have data comparable to those given above. With- 
out them prediction of success is a hazardous procedure. Even 
when some unavoidable circumstances (such as the demands of 
some colleges that a prediction of achievement be made for all 
applicants) force the counselor to reasrd a prediction on an appli- 
cation blank, he should indicate very clearly that, even with the 
best tests currently available, his prediction must always be quali- 
fied. 

Prediction of counselee's performances is a problem about which 
counselors must be acutely sensitive. Test builders and other 
measurement persons tend to claim higher prediction performances 
than they can justify, ignore the issue completely, or dismiss some 
failures by indicating that if only a few "false positives” appear 
the attempts to predict have been justified. Even one false positive, 
in the person of an infiuential individual’s son for whom success in 
college was predicted and who failed to make the grade, may result 
in irreparable damage to a counseling program. The counselor will 
do well to limit himself to descriptions of his subject’s performance 
in clearest possible form, to interpretations of his data in terms of 
probabilities when he has enough evidence to do so, and to leave 
predictions to those who must assume responsibility for the subject 
in the employment or training opportunity that he enters. 

JOINT USE OF TESTS 

The counselor is not the only person in an educational institution 
who is concerned about tests. Administrators want some data on 
the general status of the students, and teachers are interested in the 
level at which their pupils perform. It has been suggested that if a 
well-selected battery of achievement tests is given for administra- 
tive and teaching purposes they may serve equally well for counsel- 
ing. The counselor may be interested in getting evidence of a 
student’s current achievement on mathematics tests in which the 
items cover the rules and symbolic systems that have been taught. 



220 Meaturement for Guidaoce 

For the counselor's purposes it may be more useful to have the 
records of performances on an achievement test that a teacher 
selects because it covers the area she has taught than to have scores 
from so^ialled numerical aptitude tests that sample materials with 
which the student has had no ocpetience. The test that the teacher 
of typewriting selects may give the counselor a better measure of 
performance in an important part of stenographic work than a so- 
called stenographic aptitude test Reading achievement test scores 
may be as useful to the counselor as to the teacher or remedial 
reading specialist. In view of these common inter ests in test scores, 
it may be possible to use the same test battery for administrative, 
teaching, and counseling purposes. 

There are certain limitations that mxist be noted, however, if 
the tests arc to be used by the three groups. The use of test batteries 
that are administered to all the students tends to discourage the 
practice of tailoring the testing program to £t the needs of particu- 
lar couQselees. Smee the tests used for teaching purposes are likely 
to be subject matter tests, some nonacademic students may be dis- 
couraged and may not do well. Tests that may have ercellcot con- 
tent validity and therefore cover a given course area may have little 
predictive validity. To compensate for these limitations, however, 
an achievement battery has the advantages of assuring that the 
subject has been tested over areas covered in instruction. The joint 
use of a battery o£ achievement tests may save time and money and 
it may result in bringing the counseling program closer to the 
faculty. And it may tend to reduce the artificial gap that seemed 

to separate achievement and aptitude testing. 

Although the counselor parUdpates in the selection and use of 
an all-school achievement testing program, he will probably want 
to supplement the tests with some that are designed particularly 
for use m counseling. He will want to use selected tests to be given 
to particular counselees at particular times. When the tests are 
given for any of these purposes, however, it is essential that the 



Recording and Reporting Test Scores 221 

suggestions for recording and interpreting test scores previously 
described in this chapter be given full consideration. 

SUMMARY 

This chapter has been concerned with the problems encountered 
in recording and reporting of test scores in a manner that provides 
for their best use by counselors and others. It was suggested that 
test users prefer tabular to other methods of recording and that 
counselors will find the percentile procedure most useful in test 
Interpretation, although more precise methods may be most useful 
for measurement of growth, and rn research. Inadequacies of pro- 
files were described and the need for written description of subjects’ 
performances in test situations noted. Problems of prediction in the 
transfer of counselees from one educational level to another were 
described and the advantages and disadvantages of the joint use 
of tests was considered. 

Discussion Questions and Exercises 

1, In the following table the numbers and percentages of students 
who scored in each of nine categories (stanines) in a state-wide 


Table iS. Number ol students io wch test category 


Stanlne 

9 

8 

7 

6 

5 

4 3 

2 1 

Total 

Satisfactory 
college work 

43 

57 

79 

45 

28 

9 1 

0 0 

240 

Unsatisfactory 
college work 

2 

11 

16 

39 

39 

25 8 

1 2 

143 

Totals 

45 

6S 

75 

S2 

67 

34 9 

1 2 

383 


testing program administered when they were in high school are 
given. (The top category is nine, the lowest is one.) The success 
of the students in their first year at X college is also given, 
a. A counselee who has taken die state-wide tests and scored in 



222 


Measurement for Guidance 

the top (9) category wants to know what his chances of doing 
satisfactory work at X college are. What could you tell him? 

b. A counselee who scored at the (3) level asks you the same 
question. What would your answer be? 

c. What are the approximate odds that a student who has scored 
at the (4) level will fail to do satisfactory work at X college? 
What odds if he scored at the (5) level? Is it true that there 
are only two possibilities, succeed or fail, and that odds cannot 
be stated? 

d. What could you tell a counselee who had scored at the (5) 
level about probability of doing satisfactory work at another 
college? 

e. The parents of a student who scored at the (1) level insist 
that d\eit son, your counselee, attend X college. What would 
you tell them? 

2. The following set of figures indicates the percentile position in a 
university freshman college class of students whose scores fall at 
specified percentile ranks in a state-wide testing program for high 
school seniots. 

Table 19. College Fteshmaa Rank Indicated b? High 
School Rerceotile 

If Percentile Rank in Rank in the Freshman Oass 

High School -Was of X College Would Be 


41 

39 

68 

75 

80 

86 

91 

94 

97 


a. One of your connselees semes at the median for high school 
studenU. He says that he wants to go to X college and wants to 
know If he is smart cnoujji to succeed. What can you tell him 
on the basis of the data in the table? 


ID 

20 

30 

40 

30 

60 

70 

SO 

90 



Recording and Reporting Test Scores 223 

b. Another counselee scores at the 80th percentile in the high 

school testing program. He has found high school work easy 

and expects that the work in X college will be a "breeze.” 

What can you tell him? 

3. It is said that the students who score in the upper third of their 
high school classes on mental tests should go to college. What do 
the figures in the table suggest about such statements? 

4. The following quotations were taken from educational autobiogra« 
phies written by university students in a first course in education. 
What questions do they raise about the use of tests, inventories, 
and personality questionnaires in high schools? 

a. About this time Margie and ! decided that we wanted to 
know what our IQ’s were, so we sneaked into the office 
one night when something was going on at school and 
looked. Actually, we weren’t too afraid of being caught, 
because we felt we had a tight to know. 

b. During high school I never thought much about going to 
college, and thought I would become a secretary upon 
graduation from high school. I liked my business courses, 
and thought I would succeed if I went into a business 
career. On an aptitude test taken during my sophomore 
year, I ranked high in clerical and secretarial fields. This 
made me even more positive of the worth of my secre- 
tarial career. Always in the back of my mind had lurked 
the desire to be a teacher, especially of English. 

c. I was called into the office one day while in high school to 
take a social aptitude test. Unfortunately 1 was "mad at 
the world” that day. A few days later I was rushed into 
the office and they started asking me questions. But every- 
thing got st/aigbteoed out and me and society were again 

O.K. 

d. My high school was kind of an experimental school. It 
seems as if I took tests constantly. During my last semes- 
ter, I had a senior conference with a teacher of my choice. 

She showed me the results of all my tests and interpreted 
them for me. The conference was extremely beneficial to 



224 


Measurement far Guidance 

me for it pointed out my Urong and weak points, my likes 
and dislikes. I don’t remember what my IQ was but she 
said I had definite potentialities for a successful college 
student. My activities and interests wete in music, eco- 
nomics, working with money and figures, and dealing 
with people. *1116 conference didn't sway me towards a 
different vocational career but instead, it just strengthened 
the ideas and desires 1 had already formed. I certainly 
think these tests should be made available to all seniors in 
high school. 

e. Through a standard series of t«ts as to my capabilities 
and interests I learned in the ninth grade that my interests 
and the most likely region for success were in teaching of 
science or doing active dtamatic work. Throughout the 
examination, I listed what I knew that I ihould do rather 
than what I actually prejer to do. 

BXF£Et£NC£S 

Cook, W. W, "What Educational Measurement in the Education of 
Teachers?" Journal of Educational Psychology, April, 1950, pp. 
339-347. 

Courtis, S. A. "Personalized Statistics in Education.” School and 
Society, May, 1955, 81:170-172. 

Flanagan, John C. Bulletin Reporting the Basic Principles and Proce- 
dures Used in Development of a System of Scaled Scores. New 
York: The Cooperative Test Service, 1939, 4l pp. 

Gardner, Eric F. "Value of Norms Based on a New Type of Scale 
Unit.” Proceedings of the 194S Invitational Conference on Testing 
Problems. Princeton, N.J.: Educational Testing Service, 1948, 
117 pp. 

McCaU, W. A. Measuremenl.tiewYotk-. Macmillan, 1939, 535 pp. 

Ro^ey, J. W. M. Appraising and Recording Pupil Progress. Wash- 
ington, D.C: Bulletin No. 7 of the American Educational Research 
Association, 1955, p. 30. 



Recording and Reporting Test Scores 225 

Rulon, P. J. "On the Concepts of Growth and Ability.” Harvard 
Educational Review, 1947, 17:1—9. 

Smith, E. R., and Tyler, R. W. Appraising and Recording Student 
Progress. New York: Harper, 1942. 

Thurstone, L. L. "A Method of Scaling Psychological and Educa* 
tional Tests.” Journal of Educational Psychology, 1925, 16:433- 
451. 

Traxler, A. E. "Evaluation of Methods of Individual Appraisal in 
Counseling." Occupations, November, 1947, pp. 85-91. 



CHAPTER VII 


Combining Test Scores with Other Data 


In various chapters of this volume it has been shown that 
value of a test score for a given couoselee must depend on such 
factors as the validity, reliability, and norms of the test. After a 
counselor has used an instrument that was carefully selected on the 
basis of such factors and obtained a sosre for a counsclee, he will 
find it necessary to supplement the test score with information 
about past performances. U such information does not contain 
scores from other tests, the counselor must operate temporarily on 
the assumption that his counselee's test score is a dependable 
sample of his usual performances. He can seldom find adequate 
evidence that it is. As he examines the test performances of a 
counselee, be may become aware that such specific factors as set, 
fatigue, indifference, nutrition, motivation, purposes in life, guess- 
ing, memory, rapport, as well as more generalized factors of socio- 
economic class, previous testing experiences, or attitude toward 
self and authority figures may have operating to a greater or 
lesser degree.' Such factors may be crucial in interpretation of the 
test performances for a given individual. 

‘ Robert L Thom(Iike. Personnet Seltctioa. New York; John Wiley & Sons, 1949. 
K fuUy developed ootliue cd iobapenocul ififiuences oo test perfornaace is pre- 
sented 00 pigp 73. 


22 « 


227 


Combtnin9 Test Scores with Other Data 

Faced with the realization that any one or any combination of 
such factors may have prevented his obtaining a dependable test 
score, the counselor must attempt to determine the relationships 
among test scores and other data to be used in counseling. To do 
so he may compare test scores with other evidence about his coun- 
selee on the assumption that he will be somewhat consistent in 
several kinds of performances. Perhaps, in the case of a particular 
counselee, the current scores are in line with previous academic and 
test achievements. If they are, the counselor may feel that he has 
some dependable evidence about his counselee since it has been 
demonstrated that relatively high consistency of performance can 
be expected for about one half of a group of high school students 
when test scores, course grades, and teachers' descriptions of be- 
havior are obtained over a period of several years.* A body of 
theoretical writing ‘ in the field of personality organization also 
bears out the contention that a gencalized consistency of traits and 
behavior is common to many persons. 

Viewed in this manner, it is necessary in making interpretations 
of test scores to use longitudinal data that commonly appear on 
cumulative guidance records of counselees. With such data the 
counselor may look at his counselee's observed and recorded de- 
velopmental history rather than at the very brief sampling of per- 
formance provided by the thirty minute or even the three-hour test 
battery. By judicious combination of scores from tests and longi- 
tudinal clinical data, he may find some answers to the questions his 
counselee has raised. 

COMBINING TEST AND CLINICAL DATA FOR COUNSELING 

High school students frequently ask their counselors, ”... am 

* Robert A. Heimaim. "Intra-Iodividual Consistency of Performance in Relation 
to the Counseling Process." Unpublished PhJ!>. Dissertation, University of Wisconsin, 
Madison, 1952. 

* Gordon Allpoct. Perscitislity. New York: Heoiy Holt & Co., 1937. 

* Gardner Murphy. Personaliiy; A Biosociai Approach. New York: Harper & 
Bros., 1949. 

* Prescott Lecky, Self-Consistency. New York: Island Press, 1945. 



228 Measurement for Guldonce 

1 blight enough to succeed in college?" The conscientious counselor 
must always preface his answer to this bind of question with a 
cauUous modifier, . . that depends. . . The answer to a 
seemingly simple and diiea question such as the one above must 
depend upon at least two major types of infoimation. The first is 
introduced by the test score itself. The second requites considera- 
tion of the individual as a person and of bis background. 


INFORMATION PROVIDED DY TESTS 

When trying to help oaunselces find answers to questions such 
as the one raised above, counselors commonly give such tests as the 
Ohio State Psychological,* the Otis Quick-Scoring,* the Amer- 
ican GAincil on Education Psychological Esuraination for College 
Freshmen,' oe one of the others said to measure scholastic aptitude. 
Having administered the test, the counselor then looks at the score 
and compares it to others given in the table of norms. If his coun- 
selee’s score lies near or above the 75th percentile, he is apt to 
support the counselee in his plans to go to college. If the score Is 
dose to the 25th percentile, he is apt to be less than encouraging 
over the prospects of the student's success. At this pomt he should 
combine the data provided by the test score with mfomution he 
has obtained about typical performances of his counselee from 
other sources. In this way he may avoid serious consequences of 
misinterpretations of a single test score. 

PREDICTION FROM TEST SCORES 


The counselor will have examined Ihc claims for validity of a 


r.u. CKuso: Sri»«. IU*ud> 


• L. L. Tbiustooe ind Thetmi G. 
cbotoiUal Exammaiion jor College 
Service, 1954. 


■nmntone. American Council on Edtictuion Pij- 
Feejbmea. Piincctoo N.J.: Eduutioad Testiaa 



229 


Combining Test Scores with Other Data 

test to determine the extent to which its scores predict college 
success before he administered it. Most research workers who have 
used test scores in efforts to predict success in training (or on the 
job) report coefficients of correlation between test performance and 
future grades or other criteria of success in the range of r = .40 
to .50. Relationships of such magnitude allow the counselor to im* 
prove his group predictions over chance only about 12 to 15 per- 
cent.® In some instances he may combine test scores with such other 
predictors as rank in high school class and obtain multiple correla- 
tion coefficients as large as .60, Giefficients of this size will increase 
the accuracy of his average predictions to approximately 20 percent 
above pure chance. 

A prediction that raises an estimate by 20 percent may be con- 
sidered good at the race track. It leaves something to be desired 
when the stakes are success or failure in the post high school career 
of a counselee. When the counselor recognizes that his tests provide 
estimates that improve upon guesses only to this modest degree, 
great care must be taken in test score interpretation. He must avoid 
dogmatic, categorical statements such as: . . you should give 
up the idea of going to college because your score is too low . . 
or ". . . you will never make it with these scores . . . or . . 
you have the ability to succeed in training for the career of your 
choice because your score is high on this particular aptitude test. 

. . But he is obligated to inform his counselee of his chances 
for success upon information gathered from actual research studies. 
If such data are available his interpretation might be stated gener- 
ally, . . 20 out of 50 students from our school with test scores 
such as yours have been successful in the training in which you are 
interested." 

* The statistic "k" or coefficient of alieoatioa from which such percentages are 
computed may be calculated for any degree of relationship. See: J. P. Guilford. 
Fundamental Statistics in Psychology and EducaStoa. New York: McGraw-Hill Book 
Co.. 1956, pp. yis-yis. 



230 


Measurement for Guidance 


EXPECTANCY TABLES 

The data from which a statement such as the above can be made 
are usually reported in an expectancy table. If such tables can be 
drawn up and used in test score interpretation, much of the mystery 
surrounding such scores can be cleared up and better communica- 
tion between counselor and counselee developed. The expectancy 
table shows relationships bctweeri individual test scores and criteria 
of success more clearly than the general trends indicated by coef- 
ficients of correlation, which counselces cannot be expected to 
understand. It is an effective method of helping teachers and 
parents to see that the relationship between scores on tests and 
aitcria such as grades is not perfect, and that there is a great deal 
of overlap in the grades obtained by students who make different 
test scores. An example of one expectancy table that illustrates the 
relationship between performance on the Differential Aptitude 
Tests Sentences section of the Language Usage subtest and course 
grades in an English class is presented below. 


Table 20. Kelitiooshipt Between Grades in Engtivh and Diffeteniial Aptitude 
Test Lan^ge Usa^e Stores* 


Number 

Number Receiviog Each 
Grade ia Clast in English 

DAT Percent Receiving Each 
l-jjj Grade in Class in English 

Total 

Percent 

P 

P 

C 

B 

A 

Scores F 

D 

c 

B 

A 

1 





1 

80-ap 




100 

100 





1 


70-79 



20 

80 

100 






5 

60-69 


u 

63 

23 

100 

as 




8 

6 

50-59 




26 

100 







40-49 

14 




100 

rs 






30-39 6 





100 







20-29 13 

50 




100 







10-19 

100 




100 







0-9 

if>r> 




100 


2 

li 

37 

32 

16 








Test Tab!es_A W»y of Inlerpretir 
sented^t aectloo ot the table and perceetagea are pr 



Combining Test Scores wifh Ofher Dafa 231 

For illustration it can be assumed that the counselor using this 
table is concerned with a freskimn who made a score of 55 on this 
test. Of the 23 students who scored between 50 and 59, none re- 
ceived a grade lower than C. He may then tell the freshman that, 
on the basis of past experience, all students who scored as high as 
he did had made grades of C or better. This does not mean, of 
course, that this particular student might not be the exception. But 
it does give him a probability statement in nontechnical language 
that he is able to see and understand. 

The data in the table illustrate the difficulty in prediction from 
scores in middle ranges. It may be noted that equal numbers of 
students with scores between 30 and 39 made B's and D’s, al- 
though one received an F and slightly over half of the students 
with this score did achieve average grades of C. Data obtained 
from sources other than tests may prove helpful in this instance 
if they are considered by the counselor along with the test score. 

Errors in prediction. Before the coimselor can use such sta- 
tistics to help in test interpretation he must demand that certain 
conditions be met. He must satisfy himself that his counselee does 
not differ markedly from the students used in reported studies 
with the tests he is considering in terms of their home background, 
socioeconomic and cultural level, and educational opportunity. He 
should test the assumption that the grading practices and the range 
of student talent in the schools used in the reported validity studies 
are not greatly different from the schools his counselee has at- 
tended. And he should be sure that the reported coefficients are 
based upon sound sampling and carefully controlled studies. 

The reader should keep in mind that all statistical presentations 
of relationships between test performance and future success are 
based on the averages of groups, and may or may not be appropriate 
for particular persons. Consequently any efforts at prediction based 
upon the average of the group may be less than perfect for the 
individual members of it. The error in prediction resulting from 



232 Measurement for Guidance 

this fact is similar in a general way to the error that accompanies 
each test score as an estimate of an individual’s true score (see the 
discussion of standard error of measurement in Qiapter III) . The 
error in prediction for a particular individual is unknown. With 
the use of the generalized standard error of estimate, however, the 
counselor can establish the gross probable limits of his misses in 
prediction for particular counselees. This range may be fairly wide 
and will often limit the predictive usefulness of the measure,’® 
Except when extremely high or low scores are obtained, the 
counselor can find little in test manuals that can be used with pre- 
cision in answering questions raised by counselees. If, in using a 
test in which the repotted vaUdity is r= .60, the counselor finds 
that one of his coimsclees sooted at the 50th peicentilc, he will 
realize that the expected range for this person on the criterion 
measure Ires somewhere between the 21sl and 79th percentiles two 
hmes out of ftree! ■' This rather unspecific, unprecise prediction 
should make the counselor humble and wary of being dogmatic in 
pronouncements of probable future success or failure of a coun- 
seiee that are based mainly on a test score 

ora^lTTf f “ *= 

selo? s CO T if ihe coun- 

In u JerTrr * c 

to wU b TsT “ 

™ Iv wh "’“f Con- 



Combining Test Scores with Other Dofa 233 

point scale such as pass or fail, or he may decide to use a simple 
three-point scale. If such coarse criteria will serve his purposes, he 
will find that interpretations can be made with a greater degree of 
assurance than if he were working with such matters as grade point 
averages, rank in class at graduation, units of production, future 
sales in dollars or precise percentile standing. When test scores 
are to be used for selection purposes and a fairly coizsc criterion 
is used, the percentage improvement over chance of even a mod- 
erate coefficient becomes more useful. The following example from 
a bulletin of The Psychological Corporation illustrates the case. 

What permits us to use tests effectively even though their validity 
coefficients are considerably lower than .866? First, there is the matter 
of precision. The standard error of estimate refers to the band of 
error around predictions of precise, specific rankings of each indi- 
vidual on the aiterion. In most practical work, such precision is 
unnecessary. We do not ocd'tR&ziiy need to predia that John Jones 
will be exactly at the 83th. percentile in a college class, or that Bill 
Smith will be 19th in a group of 25 engineering apprentices. We 
are far more likely to be concerned with whether Jones will survive 
the first year in college, or whether Smith will be one of the satis- 
factory apprentices. For these purposes, whether Jones is at the 75th 
percentile or 90th percentile is of lesser moment; we can make a quite 
confident prediction that he will succeed, even though there may be 
a fair-sized standard error of estimate applicable to the specific per- 
centile our formula predicts. 

A second factor working in our favor in the practical use of tests 
is that, as the opening quotation notes, predictions are most accurately 
made at the extremes — and it is the e:rtremes that are of greatest 
interest to us. Few colleges grant large scholarships to more than 10 
or 20 per cent of their students. Few colleges fail as many as half 
their students and few industrial firms fire as many as half of those 
they hire. More often, the failures are 10 per cent or 20 per cent or 
possibly 30 per cent — the extremes. Tims a test which does not 
predict with accuracy whether students will be at the 40th percentile 
or the 60th percentile, can still do a valuable service in predicting that 



234 Meosurement for Guidance 

very few of the hi# scorers will be in 20 per cent who fail during 
the freshman year, or that hardly any scholarship winners will be 
academic failures. In industrial sclcrtion, a. test of moderate validity 
can be efficient in quickly screening out the •'clearly ineligible” from 
the "clearly eligible." There will remain an indifferent lonc of test 
scores for persons in the "eligible" range; for them, otlicr considera- 
tions than test scores may determine whether they should be hired. 

Let us look at some data. One hundred ninety-one eighth-grade 
boys took the Verbal Reasoning Test of the Differential Aptitude 
Tests (DAT) battery at the start of a term. At the end of the term, the 
grades they earned in a Social Studies course were obtained. Seventy- 
six were found to have earned grades of D or lower; they represented 
40 per cent of the total class. On the basis of chance (Lc., using a 
test with aero validity), we should expect to find that 40 per cent of 
those at each test score level — low, m^ium or high— obtained grades 
of D or lower. The coefficient of correlation between the test scores 
and these grades was .61, for which the index of forecasting efficiency 
comes out to just 20 per cent better than chancc—hardly enough to 
notice. Table [2l) reveals a very different story — it shows the test to 


Tasle C 21 ], Ch»nce Exp«c(atioiu and Actual PtifoimtiKn In a Social Studies 
Claia ia Relation to DAT-Verfjal Roaioning Score* 


DAT Verbal 
’Reisoaiag 

Test Score 

No. of 
Pupils 

5^ Expelled 
by Ounce to 

Earn D, E. ot F 

96 Actuallf 
Earning 
D.E,otF 

26 -ap 

19 



18-25 

49 



10-17 

60 

40 



6 > 

40 

73 


be a highly efficient predictor for the school’s purposes! Instead of 
40 per cent of the highest-scoring pupils being found in the low 
grades group (as one would expect by ^ance) , only six per cent are 
found there.” (Jlalics 


Despite the remarkable improvement over chance illustrated 

“nun Cbance," r«/ StnUe Bulletin. Nc^ 
York. The Psycholopca] 0>rje>uuOD, Mar, 19S3, pp. 9^lQ. 



Combining Test Scores with Other Data 235 

above with a three-point criterion scale (D, E, or F), the coun- 
selor has the task o£ estimating whether or not his particular coun- 
selee will be among the high-scoring 6 percent who will still 
receive poor grades, or among the bottom-scoring 27 percent who 
did not. He is faced with the problem of attempting individual 
prediction from statistics based upon the average performances of 
groups. This is a real dilemma. He must preface his answer to his 
counselee’s question about his probable success in college with 

. . that depends." 

Selcctton or counseling? When test scores are used for 
selection purposes rather than counseling, the problem may be one 
of determining which of many applicants for training are most 
likely to succeed. Another selection situation might require the 
estimation of the grade point averages for all members of an enter- 
ing class of college freshmen. In cases such as these the admissions 
officer or personnel manager is concerned with the accuracy of 
prediction at any degree above chance. If his predictive equation 
misses a few cases, the failure may be dismissed as chance error, 
and he may be satisfied that he is more often right than not. And 
in certain instances the percentage of misses, as in the illustrations 
above, may be small. Even if these misses are but one, ten, or 20 
out of 1,000, they are of much greater consequence to the coun- 
selor than to the admissions officer or personnel manager. For the 
counselor has deep personal concern for each of the 1,000 persons. 
His major responsibility is to the optimum development of each 
individual counselee. 

The counselor who has followed the discussion above must real- 
ize that, since most test manuals report validity coefficients in the 
range of .40 to .60, predictions based upon scores from tests im- 
prove pure guessing by approximately 10 to 20 percent on the 
average. How can a counselor improve his predictions? What in- 
formation needs to be combined with test scores in order to 
increase predictive eSiciencyl What additional evidence of perform- 
ance of the counselee needs to be weighed in attempts to answer 



236 Measurement for Guidance 

the questions raised by counselees, their parents, and their teach- 
ers? What must be considered when the counselor says . . that 
depends . . ."? 


INFORMATION PROVIDED BY PERSONAL DATA 

Every experienced counselor has worked with counselees for 
whom test scores do not seem to tell the whole story. The student 
with a marked reading difficulty, for example, may be handicapped 
on many standardized tests. With elementary school pupils it be- 
comes difficult to determine whether their intelligence test scores 
are low because they cannot read well or cannot read well because 
they lack "intelligence.** Study of clues from the pupil's home 
background, his everyday performance in class, his social class 
status, or his overall behavior may assist in solution of the prob- 
lem. Some students test high but achieve relatively poor success in 
their academic attempts. Others do the opposite. Prognosis for 
success of some students in post high school training or in certain 
occupations may be determined not only by their test scores and 
school achievement but also by the pressures of their families. For 
some pupils such pressure produces a high state of tension that 
reduces their efficiency to a marked degree. 

after h. miaatad that he was uniatetested ia the West Point 

Ss“tf ““ HisTarTas 

to I because that 



Combining Test Scores with Other Data 237 

writing. His leisure-time activities and stated interests were more 
in line with a vocational goal in literary areas than in engineer- 
ing. By full exploration of various badrground fa<tors with 
Jerry, his teachers, and his father the counselor could make a 
more meaningful interpretation of his test scores in relation to 
his vocational choice, 

teachers’ evaluations 

Teachers’ marks are still the rmieliable, invalid, but indispensa- 
ble evidence of success in our schools. 'They are, and will probably 
continue to be for many years, the "coin of the realm" in our 
schools. As such they are important supplements to test data in the 
appraisal of current and future achievement of students. High 
school grades ate strongly influenced by teachers who tend to re- 
ward children of the higher social classes, give better marks to 
girls than boys, and become influenced by such factors as effort, 
neatness, or "apple polishing.” ** It should not come as a surprise, 
despite the unreliable nature of grades, that test scores are some- 
times not as efficient predictors of grades in college as previously 
earned high school grades. 

Grades contribute a dimension that "on the spot” testing fails 
to provide. They may provide a longitudinal frame of reference 
with which to view the counselee* s total performances. *1116 four- 
year record of marks of a high school senior can furnish important 
information for the counselor to supplement the testing record. As 
teachers’ reports become more analytical and diagnostic, their value 
in portraying the typical daily behavior of students will be greater 
than that of the summation type of grade.^* 

^*See; Had Daugias and N. Olsott, "Beiatioas of High School Marks to Sac in 
Four ^^^nnesoU High Schools." Srbool Review, April, 1957, 45:262-288. Robert S. 
Carter, "How Valid are Marks Assigned hj Teachers," Journal of Educational Psy- 
chology, April, 1952, 43:218-228, William S. Learned, '^haPs in a Mark? The 
Carnegie Foundation for the Advanccmenl of Teaching, Thirty-Seventh Annual Re- 
port. New York: The Carnegie Foundation for the Advancement of Teaching, 1940, 

36 pp. 

Eugene R. Smith and Others. "Appraisiag and Recording Student Progress, 



238 


Measurement for Guidance 

Testing provides a sample of behavior within a highly structured 
and somewhat artihcial situation. Longitudinal analysis of the 
counselee s normal operation in a functional situation made by 
observing teachers may bring to light characteristics completely 
missed In testing. These may include work habits, social skills, atti- 
tude toward schooling, values, and level of aspiration as well as a 
host of others. As more teadiers become well trained in observa- 
tion, child study, and diagnostic procedures, psychological testing 
may become relegated to even a lesser role than it has now in the 
process of counseling. The experienced counselor is more apt to 
give more weight to a capable teacher's estimate of a student's per- 
formances than to his test scores despite the former’s lack of 
quantification. 


Counselors are aware that grades are often rewards for com- 
plMl behavior, accuracy, neatness, dependability, and willingness 
of the student to assume the teachers’ outlook and values. Viewed 
m this manner, grades may provide estimates as to how the coun- 
dee will function m some future situation in areas not tapped by 

IS; ■ !! '» ^“*<>"'5’ figures, his 

nervations in relation to test scores 

anllstS'" '*■' 

Se const , • fiu “ware of 


w. M. ““Kt a, B,os, 1542. Join 

V tnd Bert A n.v— , Association, ipjj 

Ntw yo,k; Dodo, Prtjk 1949, ^os!" lip Sl,J,i,l. 



239 


Combining Tesf Scores with Other Data 

marks in relationship to all marks given in the school. In this man- 
ner he has a useful way of understanding the fact that an "A’' 
from Miss Brown is given as often as a "B" in Miss Smith’s classes 
and consequently has less suggestion of excellence than an “A” 
grade from Miss Smith. In knowing some of the possible variables 
that lie behind the awarding of marks by teachers in his school, 
the counselor is in a better position to appraise the meaning of his 
counselee's grades in relation to future academic requirements and 
courses. 

The following examples from counseling files may further illus- 
trate the interaction of grades and test scores. 

SHEUA 

Shelia, a freshman in college, appealed for counseling to see 
"what was the matter" with her current classroom performances 
since she was teceiviag D's and F’s in nearly ah her college 
courses. This record came as a distinct shock to her because she 
had graduated fifth in a class of 136 high school students, and 
had always thought of herself as an above-average student. 

Her test record follows: 


TeJ/ 

Year 

Score 

Otis, Gamma 

10th grade 

IQ 110 

Ohio State Psychological 

12th grade 

65 percentile 

Kuhiman- Anderson 

12th grade 

IQ 117 


13th grade 
total 

90 th percentile 

American Council on Education 

Psychological Examination 

for College Freshmen 

13 th grade 



Total 50th per- 
centile 


Examination of her test record alone seems to indicate that 

she would succeed in college level work. The ACE score is some- 

what lower than the other scores, but this could be laid to the 





240 


Measurement for Guidance 

stress of freshman week and the tensions and excitement of the 
period during which the test was administered. 

Following are examples of what her high school teachers had 
to say about her classroom performancesj 

tnglish, 12th grade: "Shelia is a good student who always hands 
her work in on time, and seems to do more than is expected of 
her. She is a pleasure to have in class for she seems to love school 
and always tries her best. I can always count on her to come 
through no matter what the assignment." 

English, 11th grade: "Shelia tried very hard to please. I some- 
times think she has the reputation for being a good student 
simply because she gives such a good impression." 

Social Studies, 12th grade: "This girl does much better work 
when she is following a definite assignment. She seems lost when 
she has to dig out reference materials in the library and needs a 
great deal of help when asked to work on her own." 

High School Counselor: "Shelia has had a lot of pressure from 
home to achieve. Her mother is constantly woiiying about her 
grades. She is very determined not to let her mother down and 
works long hours to get her work done. She is going on to 
college without any definite vocational goals and seems unable 
to make a vocational decision lest it displease her mother.” 
College Counselor: "Shelia is greatly troubled about the pres- 
sures from home. She stated in her initial interview with the 
college counselor that she was afraid and lost at college. She 
seemed unable to proceed alone and unguided as she had not 
been rec^^uued to do in the mote protective high school setting. 
She was somewhat at a loss on what procedures were available to 
make a personal impact on her instructors. Her promptness and 
compliancy, which paid off in the high school setting, did not 
seem to be effective in the college situation, and she was near 
panic because of this. She was studying until midnight nearly 
every night and week-end, but found little Reward from her con- 
tinual efforts to compensate for the lack of quality of work with 
sheer quantity. 

The counselor tried to help her to see that, while she had an 



Combining Tesf Scores with Other Data 241 

average amount of talent as measured by various tests of aca- 
demic promise, she would be unable to succeed in college with 
the same techniques of pleasing teachers, doing extra work, or 
just working harder and harder as she had done in the high 
school. She was given help in developing new study habits and 
shifting her level of aspiration somewhat downward in an effort 
to provide her a more realistic understanding of herself. 

Why the high school counselor did not temper his evaluations 
of Shelia by careful appraisal of her high school teachers’ qualita- 
tive assessments (each of these teachers had given her A's) Is not 
known. Certainly this girl should have had a more realistic degree 
of self-appraisal before entering college where she was forced to 
meet reality. Her overall grades were very high, but a closer analy- 
sis of her actual work habits and techniques for getting good 
grades while in high school would have supplemented knowledge 
of her potential performance in conjunction with her high test 
record. Analysis of the disparity between her ACB score and her 
Cooperative English score pointed to specific strength in English 
grammar, but also provided evidence of her average performance 
when the task was to generalize and abstract this learning. 

Another example of a counselee whose test record and academic 
achievement need qualitative analysis because of the various fac- 
tors involved in teachers’ evaluations is given below. 

JEFF 

]e^ requested counseling fiotn a college Guidance Center 
during the summer following high school graduation, and prior 
to his enrollment in college. He was concerned about his inability 
to earn above average grades while in high school, but upset 
because his high school teadiers and counselors had always 
encouraged him to go to college despite this mediocre academic 
record. They had told him that he had "lots of ability, if he 
would only use it.” 

His rank in class at the time of graduation was 378 out of a 



242 Measuremeai for Guidance 

class of 459 students. His mother called the college counselor 
and expressed her anxiety over his chances for success in college 
because of his low scholarship. She expressed her own hopes that 
he might succeed. She stated that she had been told by the high 
school counselor that he did much better on tests than he did 
in classwork. 


His testing record follows: 












Combining Test Scores with Other Data 243 

is very shy. He is not popular with the girls but enjoys a certain 
popularity with some of the boys because he is a good fellow 
and is willing to imdergo their ridicule and laughter. His appear- 
ance is against him. He is tall and very thin and ^-ears very thidc 
glasses.” 

Sophomore year counselor’s report: “Jeff has been in to see me a 
number of times this year at the request of nearly all of his 
teachers. They report he 'cuts up’ in their classes and Joes not 
get his work in an time. His library privileges have been with- 
drawn because of his loud talking and giggling in the library 
despite repeated warnings. He is the 'dosvn' of his class and is 
constantly seeking attention in classes by talking out of turn, 
making imnecessary and sometimes rude remarks to his teachers, 
and generally 'getting in trouble.* ” 

Junior year counselor’s report: "JeS has never learned to work 
well in classwork. He £nds it easy to stay home and misses 
school frequently. Hb parents are planning a college career for 
him, but he seems only mildly interested. The father b very 
much discouraged because he knows that Jeff’s academic record 
will preclude his entrance to any high status college.” 

Senior year counselor’s report: "Jeff seems to take pride in boast- 
ing that he 'did nothing and got away with it . . I belie^'e 
that this b a defense mecbaobm and he b not fooling even him- 
self. In class he is very quiet, but on occasion b j»r/y arid rude 
when requested to do hb classwork by hb teachers. He seems 
now to have a 'chip on his shoulder’ all the time.” 

(In both of the illustrative cases key phrases have been under- 
lined in writing the reports to accentuate the attitudes, skills, and 
behaviors that were crucial in evaluating the grades these high 
school students received.) 

Evaluation of Jeff* s academic performances made at the time of 
hb high school graduation would have indicated that he was not 
likely to succeed in college level studies. When Jeff sought counsel- 
ing from a college Guidance Center midway through hb first year, 
he was failing in nearly all hb college course work. By the middle 



244 


Measurcmenf for Guidance 


of his second college year he was doing average work, largely be- 
cause of the efforts of his counselor, who helped him to diange his 
attitude toward his father and other authority figures. Many hours 
were spent with Jeff in the Reading Clinic to remedy his poor study 
habits and to prepare him better in terms of needed skills to attack 
his college course assignments. He lost most of his belligerent atti- 
tude as he clarified his vocational goals and stopped fighting the 
world. Relieved of some of the pressures and tensions that had 
troubled him all through high school, he was able to use his talents 
to better advantage and achieve some measure of success. Looking 
back, his college counselor was able to evaluate Jeff's poor grades 
m high school in terms of poor study skills plus a negative attitude 
toward school work. 


TTiese case examples clearly point out the need for a careful 
|Mto«rir;c analysis of the grades a student receives in high school, 
helra received good grades in part because of her compliancy and 
I rngness to agree to her teachers’ demands. Jeff was penalized 
without regard for his level of talent by his refusal to L so. In 
Incl^Io of Iho grades or the test perform- 

mterortar' "“‘‘f O'-D' bf careful 

bS th 1 “orfaratanding of the factors 

cotllstde®', rf'*' aid these 

ge student, rn developnrent of a realistic picture of themselves. 


SOCIAL CLASS STATUS 


Willum N. Leoruxd. "Psycholntf/T^L™®*: University of Chicago Pre« 


Tvzif t, * "oWrnj ri.:— ^'Offtnees; AStudi o, 

ZIITa' Chicago Press. 1951 

^ April 12. 1952. 75.22|!;259 WJw J System." School 

S« 1 U am Aa.l„i, ot eESL- , 5'"'“'“; J.'t'’' Inpliciion, of 

‘^ucational Theory, August, 1951 , 



245 


Combining Test Scores with Other Data 

be found among children of higher socioeconomic positions on 
the theory that those families which had accumulated property and 
position gained it because of their greater native intelligence. It 
was said that they kept their positions because their children also 
had superior brains and achieved superior test and scholastic per- 
formances. The future success in the world of children from these 
families was assured, so this argument went, but the acquisition 
of family wealth, position, and power was reinforced by their su- 
perior intellectual endowment. Recent analysis of test scores of 
members of varying social classes, as undertaken by Rells and 
others, suggests that the differences between scores of children of 
different social classes may be more a function of the test items 
than inherited intelligence. While this point of view has not been 
fully accepted, it has created enough sensitivity to this issue to 
make the consideration of a counselee’s social class status more 
important than before in the evaluation of his performance on 
tests.” A typical study comparing the performance differences of 
social class position with respect to school grades and performance 
on the Henmon-Nelson Tests of Mental Ability showed group dif- 
ferences above the 1 percent level of significance and in favor of 
middle classes over upper-lower classes.’* 

The counselor who works with a student from other than a 
middle-class home might question the validity of a score derived 
from a test that was based on middle-class constructs. He might 
conclude that a superior test performance achieved by a counselee 
from a lower social class was even more meaningful than it seemed 
when it was compared with general norms. In some situations a 
counselor might find it necessary to develop social class norms as 
aids in his test interpretations. He will certainly want to gather 
more data about the individual counselee than those provided by 
scores on group paper-and-pendl tests that are highly weighted 

Quinn McNemar. “Review of latelligence and Cultural Differences." Psycho- 
logical Bulletin, July, 1952, 49:370-371. 

Robert A. Heimann and Quentin Schenk. "Relations of Social Class and Sex Dif- 
ferences to High School Achievement” School Ropieiv, July, 1952, 49:213-221. 



246 


Mea$urement for Guidance 

with verbal items. Such items may penalize a student from a lower- 
class home. 

Still another variable, the cost of higher education, needs to be 
integrated into the total picture before the statement, . . that 
depends . . . , ’ can be completed. It has been shown that more 
than 40 percent of the top quarter of superior high school gradu- 
ates do not go on to post high school training. There is evidence to 
indicate that, with the exception of the very superior students, there 
is a high rekUonship between college attendance and the socioeco- 
nomic level of the family unit.“> Some studies “ indicate that nearly 
four limes as many college youth come from homes of professional 
parents as from homes of farmers and laborers. 

It seems dear from the above that the sodal class variable must 
be ronsidered when the high school counselor is trying to help a 
student to estimate his chances of success in post high school train- 


The value systems of middle and upper classes seem more in 
tone mih the demands of academic schooling. This is particularly 
I m education." 

denuficahon w.fc a.e teacher. Students of higher sodal classes are 
S^p that creates and 
may thus have an advantage in the accumulation of higher marks, 

ci^bu u.i. 

MinnesoUPrw,, 1954 ‘^LXpL^1”g!^’ “‘wwpolis: Umversity of 

^ to College -with Those Who Do nm ® 

Novesd^r, , 949 . 40:405-114. dLi L W^a/T"'^. ^ P>yMgy. 

Caoln Series, HI. Ng,, Haven: Y T it Ametican Ethnic 

Caplo^r. Th, Soaolog, cj V'otk. 1945. Theodore 

“«»POUs. UiMvers.ty of Minnesota Press. 1954. 



247 


Combining Test Scores with Other Data 

which influence the awarding o£ scholarships and meeting college 
entrance requirements. Their records may reflect the fact that 
teachers tend to identify with, and accept more readily, those stu- 
dents whose backgrounds encourage learning of the middle-class 
curricula taught in many schools.** 

Recognition of possible bias in test performance owing to cul- 
tural handicaps should make a counselor careful in the inter- 
pretation of scores in relation to vocational or training goals. A 
counselee who comes from a distinct subculture or a minority 
group needs more than casual consideration, owing to the strong 
possibility that his test performance may present a distorted pic- 
ture of his real potentialities. Adequate tests and adequate inter- 
pretative norms are lacking for many of our minority groups, who 
may be unduly handicapped in seeking vocational goals by their 
low test performances. 

The counselor must recognize the possibility of cultural bias in 
operation as he works with students from all classes on problems 
of vocational choice. In some cases this may mean helping a stu- 
dent who comes from a lower-class family, but who has oppor- 
tum'ty to adjust his vocational sights upward. In other cases it may 
mean aiding a high school student from an upper-class home to 
consider a more realistic vocational choice than that sought by his 
family. In schools with a large group of Spanish-American, Indian, 
or other bicultural groups and first generation children of foreign- 
born parents, counselors can anticipate that many of the students 
will get low test scores. This is particularly true in the early grades. 
Many of these lower-class students drop out of school before high 
"school graduation. In many cases they do not even enter high 
school, or if they are forced to do so by compulsory attendance 
laws, they leave as soon as it is legally possible,** 

** W. Lloyd Warner, Robert L. Harigbois^ and Martin B. Loeb. Sia// Be 
Bducaitd? New York: Harper Sc Btos., 1944. 

*« National Manpower Council. A Polity for Skilled Manpotctr. New York: 
Columbia University Press, 1954. 



248 


Measuremeni for Guidance 

The answer to this loss lies in more than just adequate coun- 
selor appraisal. It demands complete restudy of the school’s aims 
and curriculum by the school and community. Until this happens 
the counselor is in the best position in the school to help identify 
potentiality in the early school grades. He can encourage youth of 
lower social classes to stay in school until graduation and help 
them to plan their post high school training. Teachers and coun- 
selors who are interested in more than the development of com- 
pliant behavior and mass conformity can help to identify talented 
children of all soaal classes very early in their school careers. En- 
couragement and assistance over a period of several years are nec- 
essary to aid them to raise their vocational aspirations above the 
usual pattern of early school leaving and quick entrance to the 
labor market.** 


MO-nVA-nON AND OTHER INTRAPERSONAL FACTORS 

The study by HoUingshejd mentioned above revealed that many 
youth do not consider college training seriously because of unfa- 
miUanty with and lack of acceptance of the contribution of educa- 
mn to vocational and personal satisfaction. Counselors who work 
with high school youth are familiar with the talented youth who 
andfh“ f™” the curriculum of his school, 

a“lde “ diploma. His 

obs in n "-i” world where 

)0b^ pay envelopes, and cars provide real satisfaction.” 

wolTabirratn '“n ^ d-- ^‘^dents 

slol and in dis teachers, to work 

owly and methodically and to check carefully foe mistakes is 

« the caic of Clark in loL w M 35 = 175-178. 

York: Diydea Press, 1954, pp. 7]^; ^ ^he High School Student. New 



Combining Test Scores with Other Data 249 

likely to get so few test items completed that his score will be low. 
While this behavior may be sigpiRc&atly important in the pursuit 
of certain careers or of some classwotk, it tends to be a handicap 
in test-taking. This is especially true when subtests have time limits 
of three to six minutes and are designed to measure speed rather 
than power. In such cases the teachers’ analytical judgments of the 
student’s usual behavior may be important in interpreting the test 
score. 

Test authors seldom pay much attention to the problems of 
motivation in their manuals although it is known that perform- 
ances can be influenced by the attitudes of the subject toward the 
testing situation. In a school visited by one of the authors, senior 
students confided that th^ had "faked” low test scores on their 
placement tests so that they would be placed in lower or easy 
sections. They knew that they could make better marks with less 
effort in such classes. Some research suggests that when the coun- 
selee participates in the choice of tests, motivation in the testing 
situation Is greatly improved. In mass testing programs motivation 
is frequently low because the counselee fails to realize that the 
scores may give him some important answers in terms of his goals 
and decisions. Too many schools and too many teachers have re- 
jected testing, too, because they have seen situations in which the 
tests were given and the results securely filed away out of sight of 
everyone concerned. If, in the interests of economy of time and 
effort, an administrative decision is made to test a large group of 
students at one time (and such action is more the rule than the 
exception), attempts should be made to get all students moti- 
vated. Group guidance classes, home room discussions, assembly 
talks, and films may be utilized to awaken interest on the part of 
the testees by helping them to see how the forthcoming test results 
might be useful in their plans. If clear recognition is given to the 
procedure by which each student will be apprised of his test per- 

Ray H. Bixler and Virginia H. Bixler. "Test Interpretation in Vocational Coun- 
seling." Educational and Psychologic^ Measurement, Summer, 1946, 6:145—55. 



Combining Test Scores with Other Data 251 

which there is uncritical acceptance of test results in a fixed, dog- 
matic, and rigid manner. With such counselees test results, im- 
properly assessed or inadequately interpreted by the counselor, may 
cause sharp alterations in their plans and even personal disorgani- 
zation as they attempt to translate their scores into behavioral 
constructs. The counselor has an important responsibility in the 
interpretation of test scores for these persons because they do not 
have enough technical competence or toughness of mind to regard 
test performance with proper skepticism. This unqualified and un- 
critical "digestion” of test results may do a serious disservice to 
such persons. Oaunselors need to assess this tendency to “test 
trauma" before they attempt to interpret test scores to counselees. 

In the making of decisions involving vocational and educational 
choices, the inter- and uitcapersonal factors of behavior often seem 
to be as important as the intellectual and academic. At times the 
decision needs to be resolved in terms not only of what a particu- 
lar counselee can achieve but of how he functions in relation to 
other persons. Industrial psychologists frequently report that un- 
satisfactory performance on the job is more a function of per- 
sonal-social adjustment than intellectual skills’* and it has been 
shown that teachers’ grades may be influenced by interpersonal 
relationships. The counselor will, therefore, need to assess a stu- 
dent’s behavior in the personal-social area as well as in intellectual 
achievement. 

ODunseiors usually agree that data concerning the typical be- 
havior of counselees’ interaction with other persons are essential in 
guidance. General recognition is given to the fact that emotions, 
feelings, attitudes, and personal values are of great significance in 
vocational counseling and counseling psychologists have given in- 
creasing emphasis to the development of realistic self-images by 
counselees. They would like to incorporate added insight into these 

** See: R. Pritor. "The Employer Sorrey and General Education." California 
Journal of Secondary Educaiion, November 19S0, 25:45S-440. H. C. Hunt. Why 
People hose Their Jobs or Aren’t PromoteA.’’ Perjonnel Journal, JP36, 14:227. 



252 


Measuremenf for Guidance 

areas as well as the more common “aptitudes and abilities” in the 
total vocational counseling process. They are not in agreement, 
however, about how such insights may be obtained. Many turn to 
highly structured instruments, others use unstructured methods, 
and some depend largely on descriptions of behavior by observers. 

Evaluation of structured and unstructured devices is discussed 
in detail in Chapter VIII. The following section will deal with 
one of the descriptive techniques that may be used eflectively in 
gathering information about the behavior of counselees. 

BEHAVIOR DESCRIPTION 


One promising technique in the assessment of functioning in 
Ihc area of ptisonal-social behavior is the Behavior Description 
Method developed by the Records and Reports Committee of the 
Progressive Education Association in its Eight- Year Study." This 
instrument was designed in an effort to get away from the mote 
rammon "personality" tests, rating scales, and check lists of be- 
havioral traits. It attempted to provide a . definite procedure 
for studymg attitudes, habits and traits, with a technl^e for re- 
porting them and recording the results.” 

‘To ‘<™«cated that this technique, called Tie 
Method of Behovwr Detciption, can provide valuable evidence of 

Si rl'd rr ' *-Wists, and an- 

Sbl Z n -‘‘-“■I" 

mes, resemble those used m rating scales. Class- 

n, l.j-„u.d SW™ Hotel' md Ben A. Koens. 

i'tcw Totk. Drydea Press, 1949, pp 96-105. 



Combining Test Scores wifh Other Dofa 253 

room teachers, and others who have had sufficient opportunity to 
observe the pupil, place symbols indicating their relationship to the 
student beside the description that best 6ts him. The accompanying 
chart illustrates the method. 



I (E-English; MU-Music; HE-Home Economics) 


This description indicates that Mary felt secure and was well 
accepted in English and music groups in Grade 7, but had begun 
to show some anxiety about relationship to her peers in those 
classes in Grades 8 and 9. Something happened to cause the girls 
in the home economics class to treat her with indifference in two 
upper grades. A classroom teacher who wanted to help Mary 
would seek to uncover the events leading to the difficulty. In addi- 
tion to the abbreviations placed opposite the descriptive items, 
some teachers will want to add supplementary notes or explain 
what lies behind the descriptions tfa^ have given. 

The differences between this desaiptive procedure and rating is 














254 


Measuremenf for Guidance 

just that the describers try to summarize what has been observed 
while raters attempt to judge tlie quality of the observed behavior. 
There is no implication in the ‘’behavior description method” that 
any particular kind of behavior is best for one child at a particular 
time. The technique admits the wclI-Jmown fact that the child’s 
behavior may vary in different situations and under changing in* 
fluences. Thus, though each reporter makes a correct description 
of what he observes, the reports about an individual may differ 
greatly at any given time. Tlie procedure allows for the possibility 
that differences in the descriptions of various obscr\’ers may be as 
significant as the similarities they report. It must be emphasized 
that there is no implication of goodness or badness in the use of 
the term ’’behavior." 


Instead o£ requiting a perfunctory rating of personality twice a 
year, a practice that classroom teachers dislike and if possible 
avoid, the behavior description method proposes that teachers be 
encouraged to make continuous observations of their pupils with 
respect to the defined characteristics and to record their descrip- 
tions at such times as are decided upon. Duplicated sheets of the 
definitions of characteristics are furnished to the teachers so that 
they can make their descriptions of the pupils with the definitions 
hot ‘’""S ■=>' ot-™- 

*e characteristics used as headings across the top of the page. 

SfrTrreo d >he class list fo the 

bral h teachr'‘^vl“ ’ P'*'" “>= Pnpil as seen 

L ht c t accompiiy a descrip- 

H7achrsmd;The7 •-e^definitlon^ 

beginlg of fte sc^oolTr ^ *= 

upon the basis of caref, M ^ ^ “S've to make the descriptions 

are likely to be valid.” ^ evidence, the descriptions 


“John W. M. Rothney. -Evtluating and 


“'POrtu'g Popil Progress," No. 7, Wiai 



Combining Test Scores with Other Data 255 

With the information provided by the behavior desaiptions the 
counselor can take another look at his test scores and reevaluate 
them. When they are further supplemented by teachers’ grades, 
social class data, and information on interpersonal factors, test 
scores can be interpreted more meaningfully. 

CONTRADICTORY EVIDENCE: CASES 

In this chapter it has been suggested that the process of using 
test data in counseling is a complex procedure. Many ideas run 
through a counselor’s mind as he looks over the test and academic 
performances of a given counselee prior to the counseling inter- 
view. The interplay of test data with other forces in the process of 
helping a student reach decisions is illustrated in the following 
cases. 


ED 

Ed was a high school senior who wanted to know if he should 
go to college after he was graduated from high school. He 
ranked 33d in a graduating class of 100 students. Jn six semes- 
ters he received 6 A's, 23 B's, 6 Cs, and 1 D. His test perform- 
ances are recorded below. 

The counselor might assume on the basis of the test record that 
there was a poor chance for success in college for this boy whose 
average percentile rank was low. Despite this relatively low level 
of performance on psychological tests, Ed’s scholastic record indi- 
cated that he was consistently regarded by his teachers as an able 
stcrdeot. (Which predictor does the counselor use ia cocieiseimg 
with Ed? Should he use test scores that indicate little chance of 
success in college or data obtained from cumulative teacher’s marks 
which seem to indicate average or better chances for success? To 


Research Says to the Teacher. Deputfflent of aassroom Teachers, American Educa- 
tional Research Association of the National Education Association, Afarch, 1955, pp. 
20-22. 



25& 


Measurement for Guidance 


1 0 th Grade 

Percen- 

tile 

lull Grade 

Percen- 

tile 

Henmon-Nelson, Test 

10 

Henmon-Nclson, Test 


of Mental Ability 


of Mental Ability 


SRA Primary Mental 


Differential Aptitude 




Tests 


Verbal 

15 

Verbal Reasoning 

45 

Space 

20 

Number Ability 

10 

Reasoning 

55 

Language Usage 


Number 

30 

Spelling 

0 

Word Fluency 

10 

Sentences 



what extent does the counselor use both sources of data and com- 
bine them?) 

Five years after graduation from high school Ed was complet- 
ing his senior yea: in a Big Ten University where he majored in 
art. Although he was making adequate grades in hb final year, 
he had some trouble during hb first year. At that time he 
dropped out of school and went for one year to a small teachers’ 
college. (What other factors in Ed’s cumulative record should 
have been added to the data at hand in order to make meaning- 
ful interpretation? Is Ed an example of the "overachiever,” or 
b he the tj’pe of counselee foe whom tests have little personal 
validity?) 

Ed’s family had exerted a great deal of pressure on him to go 
to college and become a professional pecsoa as had all others in 
his immediate family group. During this period of pressure they 
discouraged his tentative vocational choices, which included 
ceramics, woodwork, owning and operating a hobby shop, or 
"just working with my hands.” Subsequent statements of voca* 
tiorul choice found Ed more and mote vague as to just what he 
would do after high school graduation, but he did say that he 
probably would accede to his family’s wishes and go to college- 



Combining Test Scores wjih Oiher Oafa 257 

Some anxiety was shown in his besitMion about arriving at e\-ea 
a tentative vocational decision and in his increased dependence 
on the counselor. Discussion of occupational information be- 
came difficult because Ed became too eager to obtain and follow 
suggestions from the counselor. 

Ed’s teachers reported that he tried very hard to please them^ 
and one described him as the ”Boy Scout type,” eager, pleasant 
helpful, and compliant to a great degree. They reported, how- 
ever, that he was a poor reader. This was an important weakness, 
which the counselor and Ed considered carefully in interpreta- 
tion of his test scores and in estimating his chances for success in 
post high school training. Several te^ers had reported that Ed 
was a slow, methodical worker who was overconscientious in his 
desire to be tight. He would not guess at any items. This set in 
test-taking seriously inhibited his performance on the tests that 
had time limits and made heavy demands on rapid reading. 

In summary, the counselor a)uld not possibly give Ed a brief 
yes or no answer to his <^uestioos about possible success in college 
on the basis of his test scores alone. Into the equation dominated 
by low test scores it was necessary to add the better-than-average 
school grades, the reading handicap, and his anxieties and in* 
dedsions. Perhaps the whole equation would need further 
modification in the light of Ed’s lack of clear-cut reasons for 
going on to college. 

Perhaps the one problem referred to the typical high school 
counselor with the greatest frequency by teachers is the case of 
the pupil who is reported as "not working up to capadty.” (Obvi- 
ously not many persons work to capadty regularly or mental hos- 
pitals would be even more crowded than they are at present. 
Continuous working to the limits of capacity would probably harm 
the mental health of even the staunchest subject matter spedalist.) 
Instructors are confronted almost daily with the student who tests 
high, but does less than average work in his dasses. This fact 
causes many teachers to become ego-involved. They can’t under- 
stand why the student does not develop the same deep love as they 



258 Measurement for Guidance 

have for the "Lady of the Lake/* tlw Wat of 1812 and all its 
battles, or the rules for the use of the a)mtna. This is often true of 
the students who have achiev«l high test performances. It seems 
to the teacher that such students are willfully doing poor work, 
and they may frequently exhort him to "work up to capacity. 
Bert was an example of this type of high school studeat. 

BERT 

Bert came from a lower-class family, which seemed proud that 
they had all quit school before graduation and secured well- 
paying jobs in factories and on railroads. Bert was breaking with 
family tradition by persevering in high school and announcing 
his intentions to stay until graduation. During the three years of 
counseling with him, Bert's counselors had suggested that he 
might profit from college attendance. When he decided to go to 
college the family scorned him. They were disturbed that one of 
their members planned to go to school for four more years 
rather than getting a job and bringing home a pay check. 

Bert, who was the smallest senior boy in his class, had worked 
out his particular mode of adjustment to the world early in his 
school career (before making his decision to go to college) by 
becoming the "worst behaved" boy in school. His teachers did 
not even turn around from writing on the blackboard when they 
heard a disturbance in ciass/They simply remarked, . . Bert, 
he qvLietl’ HU ceputatioa as a "cutup” and "wise guy” was 
firmly established and Bert proceeded to "get by." He did not 
obey the rules of the sdiool or get his assignments in on time. 
He indicated that he was uninterested in the class and that he 
saw no need to cooperate with his teachers. As a result Bert's 
academic record showed 3 B’s, 18 C's, and 8 D's over the last 
three years of high school, and he graduated in the lower third 
of his class. 

Bert s test record belied this academic record. Of the 22 tests 
that he took in his last three years of school, as shown in the 
following chart, only two dtoj^ below the 85th percentile. 




When apprised of his test performance, Bert's reaction was, 
"I’m not that good; those tests don’t tell what you can do”; but 
he was obviously pleased. Reassurance and encouragement were 
needed over a three-year period of tmunseling with Bert before 
he was able to integrate into his self-picture the fact that he 
could probably succeed in college. His low record of marks 
caused him some concern when he was discussing college possi- 
bilities. He stated that he knew he could get by in high school 
without pushing himself. Bert had learned that he could, with a 
little cramming at examination time, pass courses in which be 
had low marks. He frequently held off in his efforts until final 








260 


Measurement for Guidance 

examination time. By the end of his fourth year of high school, 
Bert had his sights upon college training, and although he 
turned down a scholarship in favor of enlistment, he had 
determined to begin a collegiate career after he was released 
from service. 

Still another problem in test interpretation is indicated in the 
case of Bob who suffered from cerdjral palsy. Some interesting 
speculations about his test performance and the efforts of a coun- 
selor to appraise his chances of success in future training are raised 
in this case. 


Bob wap highly regarded by both his teachers and his fellow 
students. He sought the counselor’s aid in determining what he 
might be qualified for after graduation from high school. Bob’s 
particular interests and leisure-time activities were in the held 
of r^Iroad and airplane scale model building, electronics, and 
television production. He was particularly interested in ex- 
ploring his chances of successful employment in these fields. 

As early as the seventh grade one of his teachers remarked: 
■Bob will never do justice to himself on a lest. He does not 
work well under pressure, and il takes him longer than normal 
to write. Durmg the school year he did the regular work of the 
7th grade, although his test record did not show this.” At that 
time Bob s Stanford Achievement battery scores indicated a 
median grade placement level of 6.8 and his sublest grade place- 
higher Lres were 
He had been placed In an 
c^eTa difT. ” “rt through his school 

V ““Ptahensive set of lest scores 

appeared on his records below. 

"d siial ,h d-' ““hs were in English 

and “ ■ ' “ -th-itics 





Porteus Mazes 


WechsJef-BdIevue Scale I 
Verbal Scale 


GiHfornia Mental Maturit 


Dmerential Aptitude Tests 
Verbal Reasoning 
Abstract Reasoning 
Space Relations 
Mechanical Reasoning 



A couaselor’s summary of Bob’s high school career, written at 
the time a referral was made to a vocational rehabilitation 
agency, indicated that most of his vocational counseling had 
been centered around some phase of radio and television. Bob’s 
major expressed interests were in the production and technical 
end of this work. His hcJ^by of scale model building, electrical 
wiring, and electronics bad been continued during his high 
school years. He had used these skills by doing most of the 
wiring and electrical work for the school plays. His work 
experience outside of school bad been as a pinboy in a bowling 
alley, a bag-filler in a candy factory, and as a clerk and salesman 
in a hobby shop. He preferred the latter and expressed some 
interest in a job in that phase of retail trade during his junior 
year. 

In his senior year, Bob’s choices seemed to aystallize in the 
direction of technical electronics. He also explored the idea of 
form-maker or scale model builder in metal work or some phase 












242 Measurement for Guidance 

of the automotive industry. He realized that he could not com- 
pete on the open market in such demanding occupations because 
of the strain and tensions th^ would be present. He said that he 
felt able to do such work if he did not have to work under 
tension or at a rapid rate. 

The counselor observed Bob under testing conditions several 
times, and felt that the conditions were such that no valid result 
could be expected. Bob had to strain to make the small marks 
needed on the machine-scored answer sheet, and he was unable 
to make his pencil do what he wanted it to do. His physical 
handicap was so severe that he a>uld not type or write with 
enough speed or legibility to pass even beginning English class 
in college. Although he explored the possibilities of having an 
electronic aid to help him, or hiring someone to write for him, 
he seemed to be a poor risk for the usual college training. 

Bob accepted his phywcal handicap very well. His interest 
in electricity, science, model building, and radio had all been 
started as theraputic measures to enable him to gain control over 
his hands and the muscles of his fingers. The counselor’s task 
was to determine whether these interests were to be regarded as 
simply healthy compensatory measures or potentially realistic 
vocational possibilities. In this case the test record could not be 
taken at face value. The dual handicaps imposed by the speed 
required, plus his inability to make the proper little marks on 
paper, could make all the results meaningless. Powerful motiva- 
tion had evidently enabled Bob to accomplish, in his wiring and 
radio activities, that which would normally have been Judged 
Impossible. The resolutioa of this dilemma required serious 
examination of some of the assumptions that are commonly 
made about test petfonnances. 

In the following brief reports some other factors in the inter- 
pretation of test scores are raised. 

DAVB 

Dave achieved a perfect raw scare of 90 for the 90 items he 
attempted on the Henmon-Ndson Test of Mental Ability when 



Combining Test Scores with Other Dota 263 

he was a sophomore in high school. He ranked number one out 
of some 30,000 sophomores in a state-wide testing program. His 
school work was near perfect, and his teachers consistently 
awarded him A’s. Dave wrote with a mature style, and his 
autobiography prepared for one of his counselors was insightful 
and well written. He had tentatively diosen some field of writing 
as his vocational goal. Yet Dave received a 15th percentile score 
on the Differential Aptitude Spelling subtest Explanation of 
this one low score along with other scores on many tests, all at 
the 99th percentile, was hard to make. Dave has since graduated 
from a major university. The only exception to his straight A 
record was a B in one science course and 4 Cs in ROTC 

MACK 

Mack was a veteran and a junior in a school of education at 
a large midwestern university. He had been a building contractor 
and had planned, built, and sold more than a dozen houses 
before he entered college. It was difiicult to interpret his 10th 
percentile score on the Bennett Mechanical Comprehension Test 
to him. 


RAOUL 

Raoul, an Indian boy from a tiny Spanish- American village in 
the Southwest, achieved a scaled score of 1 7 on the Block subtest 
of the Wechslet Intelligence Scale for Children. He was three 
years retarded in grade school because he did not seem to under- 
stand his lessons or even see why he was in school. His teachers 
were unanim ous in their opinions that he was "stupid." Covtld a 
"stupid" person achieve such a high score in a complex mental 
function? What could a counselor tell Raoul or his teachers 
abouthis test performance that would be meaningful? 

ART 

Art, a sophomore in a premedical curriculum in a college in 
the Southwest, scored in the lowest quarter on the American 
Council of Education Psydiological Examination and the Co- 



264 Measurement for Guidance 

operative English test. His grade average was a high B, and he 
had received A’s in chemistry and college algebra. During 
counseling he revealed very strong hostility feelings toward 
persons in authority positions and mentioned to the counselor 
that the college psychometrician, a former WAC major, made 
him so angry that his testing experience was not typical of his 
usual intellectual performance. 

BILL 

Deep-seated feelings of personal inadequacy and lack of 
personal worth bothered Bill so that he sought counseling. He 
wished for help in assessing his vocational choice of elementary 
school teaching, and wondered if he was intelligent enough to 
achieve his goal. His test scores were all in the top 10 percent 
for college freshmen, and hb grades were all A and B, but he 
could not seem to accept these as evidence of his capabilities. He 
seemed unimpressed with the report of his test performance and 
unable to integrate it into hb picture of himself, which had for 
many years been that of a person of little worth and little ability. 
The task of test score interpretation with Bill necessitated ex- 
tensive counseling aimed at helping him rebuild new attitudes 
toward himself and the world about him. 


Barbara, a high school junior, was a straight A student who 
srored above the 85th percentile on all subtests of the Differen- 
tial Aptitude Tests and above the 90th percentile on the subtests 
of the Operative EngUsh test. Her vocational choice was 
clerk-typst. H.r mother hrd held such posMonr. The realistic 
probability of immediate employment after high school as a 
secrcta^ or typist seemed to outsseigh the counselors' sugges- 
tions that her test perfotmance and academic record suggested 
many mote rewarding possibUWes. She seemed totally unim- 
ptessed mth her atove average lest scores and her fine academic 
record. Her inability to ampt the impliations of her above 



Combining Test Scores with Other Data 265 

average performance seemed to block efforts of the counselor 

to have her consider other employment opportunities. 

The nine cases described above raise the issue of how many 
confirming data are needed to validate the assumption that test 
performances could provide adequate samples of behavior of each 
of the counselees described. In each case the test data would make 
a counselor hesitant about making any prediction about future 
performances of the individual, and the question must be raised 
about the need to interpret test scores in terms of other information 
about the counselee. Many of the current difficulties would seem 
to arise from the fact that counselors are trying to make individual 
interpretations of test performances when the counselee has taken 
a group test administered "by the numbers” and his score is com- 
pared with norms based upon what the average person in a group 
will do. When the individual case is examined, variability from 
rather than conformity to the general rule seems to bold. 

£ach of these cases illustrates one or more important considera- 
tions that must be noted prior to using test data for counseling. In 
the case of Bob, his test scores must be interpreted in the light of 
medical and psychological evaluations that are needed to bring 
meaning to his attempts to take tests to answer his questions about 
himself. Ed, Mack, or Art are examples of counselees with lower 
than average test performance that did not seem to be consistent 
with their academic performances. If one were to take the test per- 
formance at face value, great disservice would have been done to 
any one of these students. Each, in his own way, had achieved far 
greater goals than the test record would normally indicate. In 
each case the counselor needed to integrate the test data with other 
information that was contradictory. In Ed’s case the counselor 
needed to assess the pressure to achieve academic goals raised by 
the family, and to speculate about the meaning of the grades re- 
ceived. Were they reflections of Ed’s middle-class background, his 
compliancy, and his willingness to please his teachers? Or did they 



266 


Measuremcnf for Guidance 


truly represent achievement? Further investigation suggests that 
some of his poor test performances were caused by the tensions 
aroused by the family pressures to succeed. His counselor might 
conclude that the usual timed tests did not have personal validity 
for this boy. 


Bert and Raoul, each in his own way, gave evidence of far more 
talent than his teachers had estimated. Part of this low evaluation 
of school achievement may be laid to their lower socioeconomic 
class backgrounds. In Bert’s case this was strengthened by his 
manifest hostility to the social role of his teachers and to his refusal 
"to play the game" according to their rules as Ed had so well 
learned to do. Barbara, another student from lower than middle- 
class family, showed this in a different way. Although she had 
learned to achieve the academic standards set by the school, her 
r^usal to seek the usual middle-class goals of upward social mo- 
biUty through further education indicates that her personal value 
structure was little changed by contact with teachers. Her prime 
»nc«n was to gain employment immediately upon graduation 
from hi^ school despite the exhortations of teachers that she go 
on to college. 


n Ml s case the test data were meardngless. An, prediction of 
hL “'■^Meration his feel- 

ins?eh^ r “"'il he could gain some 

TmI’ ' >■' -P=ricnc= difficnll, In 

achieving success m anything he attempted. 


•THAT DEPENDS" 

the inleniretation of an individnL-s™fp“ “““ “ 

cry in guidance circles h'a:tr™hlre“ 



267 


Combining Test Scores wifh Other Data 

This has caused counselors to distrust anything that does not have 
numbers attached to it. The writers feel certain that pres^tation 
of a series of Beta weights obtained by multiple regression equa- 
tions would be more warmly received by many counselors than 
the more subjective, more clinical approach, . . that depends 
. . . proposed here. Careful evaluation of the results of quanti- 
tative prediction for the individual case demands that counselors 
postpone its use for perhaps the next 50 years. Both the counselor 
and the counselee are apt to be deceived by pretensions to pure 
scientific objectivity when using the group measurement tools pres- 
ently available. 

The more "subjective” approach that has been proposed above 
demands counselors with high levels of training, persons sensitized 
to intrapersonal relations, and scientists with feelings of humility 
for theii new and still unfinished tec hni ques, The predictive equa* 
tion may begin to balance when the counselor’s appraisal of objec- 
tive data about counselees is modified in terms of his experiences 
with like individuals. This fiexible approach must be kept open for 
the interplay of insight, "feel,” and intuition based upon carefully 
evaluated previous clinical experience and training. The questions 
"By how much.^” "To what extent.^” and "To what degree?” can 
be answered in part by the counselor's overall knowledge of the 
counselee. His knowledge of what the counselee has achieved in 
past performances and his assessment of the environmental pres- 
sures imposed upon the counselee’s performances may contribute 
to the overall Gestalt of the coxmselor's diagnosis of a particular 
case.®* 

At some future time the data involved in the ". . . that de- 
pends . . may be quantified. Until that time, the counselor 
should seek to exploit to the utmost its qualitative potentials. In 
doing so he may help the individual student to find his way and to 

** See: Carl R. Rogers. ‘'Persons or Sdence? A Philosophical Question." Ameriean 
Psycholognt, July, 1955, 10:267-278. Paul E. hfeehl. Climeal Versus SiasisSscal 
PrediciJoa. Minneapolis: University of Minnesota Press, 1954. 



268 Mealurement for Gutdonco 

estimate his best bets and best clioices among the many avenues of 

action open to him in a free society. 

INDIVIDUALIZEO TEST INTERPRETATION 

Tbrougbout this volume, and particularly in this cliapter. it has 
been suggested that test interpretation must be done in relation to 
odier influential faetors in the life history of students. Procedures 
for incorporating various items of background data in lest interpre- 
tation have been discussed above, and some suggestions of hosv 
t lese rnight mflucnce test scores have been made. It lias been indi- 
cated that many personal background variables must be considered 
along with actual test scores ssben a counselor reaches the point in 
the counseling process at which lest information svill be ofValue. 
and "ith other data 

versial sure^" are contro- 

counsllor r - « '"'“I'-'J 'Vhen a 

*at h ha u T I- assumed 

counse e “T '» rjuLtions of his 

students in 

lateMtae Tf m s ‘ « some 

mouValn ont p^t oTthe'’"'’ 

the use of test rZh and, if this condition holds, 

counselingp^cr ‘he 

Students in high schools are freguen.,, 

r™ ••• 

res 0 Counselwi. PttiDnHtl and CuiJjme* / ' '‘Interpreting Test 

Barbara A. Kirk. •'Individualiaing of T«* , February, 1932, 30:32^322, 

Icur^al. April. 1932. 30i30(».«,5 MkS P^‘onn.1 \nd cJancf 

Sso ■'■'•I ttculi 3S:S2S-Sa6. DoiuN 

Centurv r tV' S. Bordm. Pnch^lotU^ C November, 

Y^rL^'w f“’ I?”' PP- ”>-279. 

PP etoa-Century-Crofts. 1955, pp_ i42_i^ ^ Coumehr. New 



Combining Tesf Scores with Other Data 269 

given one or more tests without their prior knowledge, consent, or 
interest and the outcomes of this procedure are apt to be disap- 
pointing for all concerned. Very little motivation is aroused by this 
routine, and if resistance and irritation are avoided, indifference 
and boredom seldom are. In such mass testing procedures the one 
most concerned with the outcomes, the student, is seldom informed 
of the results. 

When tests are used as instriiments in attempts to find partial 
answers to questions raised by the a>unselee himself, far more 
rapport may be assumed and a more valid set of scores expected.” 
It seems that students who have taken tests under such conditions 
should be apprised of the outcomes of their test performances.] 
There is much opinion but little research on the value of doing so. 
One research study has, however, pointed up the fact that when 
high school students were given the results of their test scores 
during individual counseling sessions, their counselors observed no 
significant negativeor disturbing effects.” Most students responded 
to this procedure with enthusiasm and appreciation. They indicated 
that they were thankful for help in establishing a more realistic 
picture of themselves through the counselor s aid. 

Interpretation of test scores to counselees may provide partial 
answers to questions that have been raised in the counseling itself. 
The counselee may have asked, . . have 1 enough background 
in mathematics to consider a career in science.^” The c ounselor's 
report of a test score in this area may provide a partial estimate. 
^st scoresmay also jissist a counselee to see his test performance 
as a reflection of his ^pressed feeIings_abouthirasel£. Full consid- 
eraKoD'anilLis^qi^is be/ond the scqpe of this book^t the reader 
is referred to reports by Bordin and Tyler, who have presented 

** See: R^pb Bixlet aod Virginia Bixler. 'Test Interpretatioo in Vocational Coun- 
seling." Educaiionai and Piyebahgtcal Meaimremeat, Winter, 1946, 6:145-155. Paul 
E. Dressel and R. W. Matteson, "Effective Oient Participation in Test Interpreta- 
tion." Educational and Psjchotogiful AUasuremeat, Autumn, 1952, 4:693— 706. 

®®Rothney, op. at., p. 322. 

Bordin, op. cit., pp. 274-276. 

Tyler, op. erf., pp. 159— 166. 



270 ''Measuremenf for Guidance 

careful discussions of this matter. It is sufficient to say that valua- 
ble clues to the counselee’s perceptions of self and his attitudes 
toward himself may be gained by discussion of his expected and 
actual test performances. The clues and hunches obtained may help 
in the attainment of fuller understanding of and by the counselee. 

Since clear communication between counselor and counselee is 
essential, extreme care must be taken to use language that has real 
meaning to both. Technical terms such as IQ are subject to dis- 
tortion or misunderstanding and should never be used. Generalized 
statements that avoid the inference the student has or has not 
abUity to achieve at specific levels should be employed. 

Test interpretations must be made in individual conference with 
the student. If attempts at interpretation are made in group ses- 
sions by teachers or others, there is much danger of misinterpreta- 
tion and distortion by the student who is anxious about what his 
test results wUl foretell of his future. 

During individual conferences the counselor should refresh the 
TeZ' '■irr. a cop, of 

iosTho P™ac score Ld 

iow how rt compares with other students who have entered the 

W of rrammg the student is considerbg. The student may be 
Triv™ iTr'T ^ that 

diflTpS™ ' 

table described earZ ''' 

intrd^i^srcrrrd^r^ 

lZ''totg°ref2tn‘rriV‘ devetpr^ITfy fte 

- a^ded device .bar purports ^ rra^!::: m 



Combining Test Scores with Other Data 271 

current problems. When thus relegated to their proper place, tests 
can be helpful in the counselingprocess. 

SUMMARY 

In this chapter some issues involved in interpreting test scores to 
counselees have been raised. Particular attention has been given to 
the way nontest data may be used with test scores in interpretations. 
The difficulties of basing predictions upon test scores alone have 
been explored and the counselor has been reminded of the various 
sources of error in such estimates. The effect of nontest variables 
upon prediction processes was explored, and special emphasis was 
given to the estimation of the infiuence of school msrks, social class ^ 
position of the student’s family, and various intrapersonal factors 
upon the total counseling process. Several cases were presented to 
illustrate the interaction of these forces in counseling. Emphasis 
was placed upon the necessity of understanding all aspects of a 
counselee’s behavior. In the following chapter, discussion will 
focus upon attempts to measure personality, attitudes, and interests 
by use of structural and projective methods. 

Exeroses 

1. Develop in a role-playing session (with various students acting 
the part of the counselor, the patent, and the counselee) test 
score interpretation for George, a high school senior, who is con- 
sidering entrance to a school of engineering. Assume in the first 
session that the following test scores are all you have to use. No 
other data about George ate available. 

2. Reconsider your role-playing interview with George and his 
parents with the additional data suppH«i by the following semes- 
ter school marks earned in high sdwol. 

3. Again reconsider the tole-playing interview with George with the 
addition of the following information taken from the counselor’s 
files. 



270 / Measurement for Guidance 

careful discussions of this matter. It is sufficient to say that valua- 
ble clues to the counselee’s perceptions of self and his attitudes 
toward himself may be gained by discussion of his expected and 
actual test performances. The clues and hunches obtained may help 
in the attainment of fuller understanding of and by the counselee. 

Since dear communication between counselor and counselee is 
essential, extreme care must be taken to use language that has real 
meaning to both. Technical terms such as IQ are subject to dis- 
tortion or misunderstanding and should never be used. Generalized 
statements that avoid the inference the student has or has not 
ability to achieve at specific levels should be employed. 

Test interpretations must be made in individual conference with 
the student. If attempts at interpretation are made in group ses- 
sions by teachers or others, there is much danger of misinterpreta- 
tion and distortion by the student who is anxious about what his 
test results will foretell of his future. 

During individual conferences the counselor should refresh the 

sh^rhn score and 

how It compares with other students who have entered the 
^d of trauung th, sUident is considering. The student may be 

a riv« oeTeTT” T =“se is such that 

did. If oredin- °^ **"'?“** scored as high as or higher than he 

tabiedescribedlLZryTt^S'“'’'"“‘‘‘“'‘‘’‘^'“f‘"“'^ 

ioMraf:rr:'’rrd'" 

that test scoreii hi, V i- • j “ “^7 emphasize the fact 

needed beW^ T <=*“ information is 

Thi t^Ct can be answered. 

She wboie col:r4 ~e.^r“^ 

Student of greater maturihr, A ■ development by the 

an added device that d rather than upon the use of 

fii-e 4 ulck and final answers to 



Combining Test Scores with Other Data 271 

current problems. When thus relegated to their proper place, tests 
can be helpful in the counseling process. 

SUMMARY 

In this chapter some issues involved in interpreting test scores to 
counselees have been raised. Particular attention has been given to 
the way nontest data may be used with test scores in interpretstions. 
The difficulties of basing predictions upon test scores alone have 
been explored and the counselor has been reminded of the various 
sources of error in such estimates. The effect of nontest variables 
upon prediction processes was explored, and special emphasis was 
given to the estimation of the influence of school marks, social class 
position of the student's family, and various intrapersonal factors 
upon the total counseling process. Several cases were presented to 
illustrate the interaction of these forces in counseling, Emphasis 
was placed upon the necessity of understanding all aspects of a 
counselee’s behavior. In the following chapter, discussion will 
focus upon attempts to measure personality, attitudes, and interests 
by use of structural and projective methods. 

Exercises 

1. Develop in a role-playing session (with various students acting 
the part of the counselor, the parent, and the counselee) test 
score interpretation for George, a high school senior, who is con- 
sidering entrance to a school of engineering. Assume in the first 
session that the following test scores are all you have to use. No 
other data about George are available. 

2. Reconsider your role-playing interview with George and his 
parents with the additional data supplied by the following semes- 
ter school marks earned in high sdiool. 

3. Again reconsider the role-playing interview with George with the 
addition of the following information taken from the counselor’s 
files. 



272 


Measurcmenf for Guidance 


Table 22. Test Scotcs sad Norms of a Couoselee 


Tests Grade Score Norms 


Differential Aptitude — 

Numerical Ability 12 

Differential Aptitude — 

Mechanical Reasoning 12 

Differentia] Aptitude- 

Space Relations 12 

ACE Psychological — 

College Level 12 

Q Score 

T Score 

Iowa Silent Reading 10 

Otis 

Gamma 10 

Stanford Achievement, 

Advanced (J) 8 


Paragraph Meaning 
Word Meaning ‘ 
Spelling 
Lan^ge 

Atidimetic Reasoning 
Arithmetic 
Computation 
Social Studies 

Study Skills 


9)th percentile 

12th boys 

45th percentile 

12th boys 

30th percentile 

12th boys 

4-year 

college 

90th percentile 

70th percentile 

80lh percentile 

freshmen 

40th percentile 

local norms 

Otis IQ lOS 

national norms 

Battery Median 

8.4 

7.0 

8.4 

8.0 

9.4 

9.0 

10.0 

8.2 

8.5 

7.0 

national norms 


FamUy Bickground; Father, newspaper publisher of counlry weekly, 
8th grade education. Mother, college graduate, 
housewife. Older sister, Junior in College of 
Education. 


Health: 


Interests and 
Activities: 


Slim, but tall for age. Frequent attacks of 
asthr^ as a child. Persistent trouble with 
sprained ankles in high school athletics. Gen- 
eral health excellent. 

Dates with gitb at every opportunity. Presi- 
dent of many school clubs, DeMolay, Hi-Y. 



Combining Test Scores with Other Data 273 


Table 23. Actual Grades of Student of Table 22 


Raak ia Class at Graduatioa 

15th in a Class of 120 

English 

4 A's, 4 B’s 

Mathematics — Algebra 

A.B 

Geometry 

QD 

Trigonometry 

C 

Language — Spanish 

F,D 

Social Studies — Civics 

A.B 

World History 

B.B 

U. S. History 

C.B 

Problems of Democracy 

A. A 

Science — General Science 

C.C 

Biology 

CB 

Chemistry 

D.C 

Physics 

cc 

Journalism 

A 

Speech 

B 

General Business 

A 

Typing 

B 

Band 

5 A’s, 3 B’s 


Collects match boxes. Likes dancing, tennis, 
badminton, travel. 

Work Experience: Works for father in press room (a job he dis' 
likes). No other work. 

PrcJjJem Areas; Father insistent on engineering because of high 
status and high pay. Boy neutral or passive 
toward this selection. When asked for alternate 
choice, George was unable to say other than 
"working with people in some way." 

4. Read the following case of Mike. "I believe the one rezson why 
my son Mike is attending the University," wrote his mother to a 
schooi official, "is because at one time you told him he had the 
ability, but it was up to him to use it. That was all the encourage- 
ment he needed. After graduation he attended siunmer school and 
took a mathematics course which he knew he didn’t know very 
well and finished with a 92 average. Besides he worked and saved 
enough money to pay his entire tuition and expenses. He is doing 



274 Measuremenf for Guidance 

very well at the University. I want to compliment you on your fine 
work." 

No one, at the time Mike was graduated with a rank of 186 in 
a class of 353 , w’ould have predlctal that he would go on to a urn- 
versity. In fact, one month before he was graduated Mike said that 
he expected to go to work in a fadoty or as a handyman in a 
hoteL He said that if things went well with him he would be 
"just lucky.” Yet Idike surprised everyone, even his mother, and 
certainly his teachers. He was one of those persotis who make 
prediction of human behavior a very hazardous procedure. 















Combining Test Scores with Other Data 275 

I have never known my father as he died in an automobile 
wreck before I was six weeks old. I have only known one 
relative on my fathers side, my unde one time congressman 
for the state. I have also been told that my Grandfather was 
once govenor of a state. 

My Mother ran a shop somewhere in the city. I remember 
little of this but two things have stood out quite dearly, one 
was that I had gotten in to a fight with another boy about my 
age. This boy pick up a board which had a nail in it and the 
nail price my eye. Everything turn out alright as the nail had 
missed the pupil. The only other thing I remember of those 
yrs. was the time I bad a tooth pull. 

During the depression my mother sold the shop got work 
as a housekeeper. 

I had now started to school. 

Well It didn't take look when I found a friend in this 
school and we went together in to stores and started to lift 
candy. We were soon caught Sc I probably was never so scared 
in my life. 

There was a dog in my life a that time teddy was his name 
he was old & blind but was dreadfully smart 

We then moved to a new house and mother took in my 
Grandmother as she was very ill. I remnber brief but Pleasant 
visits to my grandfather’s in another town. For a summer or 
two we live a lake five, a small wooded Jake part of which 
is now a game refuge. 

When I reached the age of eleven mother married so we 
move to another town. 

I supposed I fooled around like the average boy the only 
impoftsnt thing was that the war was soon over with. You 
see I had remenber Pearl harbor and I remenber mother read- 
ing the papers to me when hilter had first started waring on 
neigrning countrys in 3S or 39. 

I had used to sat near the window & watch what was than 
New car as the went down the street as she read. 

A little while after the Pres, death we had brought a farm 



Measurement for Guidance 

near here were I now Uve. I have travel quite bit these last 
few years too. That about all the important things. 

Despite Mike’s lack of polish in his written work he received 
average marks in English during his high school career. The 
English teacher in his junior year said: "His spelling is barbarous, 
but what he says is really bright. He has unusxially fine taste.” 

Mike had always been requited to work hard at part-time jobs. 
He did farm work, derkcd in a store, and, during his senior year, 
worked as a houseman at a neighboring city hotel three or four 
evenings a week and all day on Saturdays and Sundays. Since, on 
the evenings he worked, he did not get home until one o clock 
in the morning, he was often sleepy in class and seldom got his 
homework done. As the result of an ar^ident, in which he suffered 
some concussion, Mike had frequent headaches. This condition 
and the lade of sleep resulted in the description of him by his 
teachers as the "epitome of apathy and lethargy.” 

Mike did not seem disturbed when he was told what he had 
done on ffie tests. He did say that he had one of bis headaches 
when he took the Primary Mental Abilities Test, and that he 
always expected to do poorly in any form of mathematics. He 
liked algebra least of all the subjects he took in high school and 
Spanish was not interesting because he could not get good grades. 
English was one of the subjects he liked best because, he said, “I 
like to read and write stories.” Biology was also a favorite subject. 
Because at one time he thought he might go into defense work 
after graduation, Mike added to his school and work load by 
taking an evening course in welding at the city v(x:3tionaI school 
two nights a week. The additional rime meant even less test than 
he had been accustomed to, and he byame even sleepier and more 
lethargic during his senior year. 

His lest record follows below. 

Mike was on his own' as far as activities were concerned. 
Early interests in chemistry and stonp collecting soorv vanished 
when he raised enough money to buy a cheap, old ar. He liked 
to smg and play baseball, but he lode no part in school activities 



Combining Test Scores with Other Data 


277 



Percentile 

Tests 

Grade 10 

Grade 11 

Grade 12 

Hcnmon-Ncison Test of 




Mental Ability 

58 

75 


Reading Tests 




Progressive Reading 
Vocabulary 

Progressive Reading 


X 

■ 

Comprehension 
Cooperative Speed of 


T 

■ 

Reading 




Primary Mental Abilities 




Verbal 

50 



Space 

32 



Reasoning 

15 



Number 

9 



Word Fluency 

12 



Differential Aptitude Tests 




Number Reasoning 
Sentences 



H 





Mechanical Reasoning 
Space Relations 

Clerical Speed and 



B 

Accuracy 




XU/z years accelerated 

T 2 years accelerated 





in those areas because, he said, "I guess I'm a little lazy on those 
things." He liked to read a lot of "any kind of books except 
cheap novels.” His best experience was a two-week trip with his 
uncle to Chicago where he enjoyed seeing things, "especially skid 
row,” 

In a check list of activities, he marked the following items in 
the frequency given. 











278 


Measurement for Guidance 


Riding 

Radio I 

Movies ) 

Television > Once a week 

Tennis \ 

Hiking I Twice a week 

He said that he could go out any evening that he wished and 
could come home at any time that he chose on any night of the 
week. He refused to fiU out a diary of activities illustrating a 
typical week but he said he was seldom at home, and he implied 
Aat no one was much concerned whether he was there or not. 

Mike did not remember his father. His stepfather merely 
accepted him, but neither showed much concern about the other. 
His half-brother and half-sister were nine and twelve years 
younger, and he was indl^erent about their activities. His mother, 
who had had only an elementary school education and who had 
stmggled hard to make her own way in depression years, was 
convinced that Mike should go on to training beyond high school. 
She saw no way of providing financial assistance, and he seemed 
completely reconciled to the idea that he would have to make his 
own way as he had done throughout his high school career. His 
stepfather, who had been graduated from high school and at- 
tended a vocational school, was a part-time farmer and carpenter 
who saw no reason, even if he had been able to do so, to support 
his stepson. Mike said that he never talked about his hopes and 
plans at home. 

Any plans for Mike’s future would be influenced by his appear- 
ance and manner. He was slow-moving and slow-talking. He had 
a serious case of acne. It seem^ as though he had slept in his 
clothes and had not had time to comb his hair before coming to 
school. And this condition did not change significantly at the time 
when most of his contemporaries became concerned about their 
appearance. He seemed to be very sleepy at all times, and the 



Combining Test Scores with Other Data 279 

general impression he gave was one of a “lost soul” who seemed 
to lack confidence in himself and about whom nobody seemed to 
care. His two closest friends had notorious disciplinary records, 
and Mike said that they would describe him as "quiet in class but 
reckless otherwise.” From observations of Mike in school, the 
term "reckless” would just never seem to apply. 

When he was a sophomore, Mike said that he would like to 
join the Air Corps for three years and then become a farmer. He 
continued to express interest in the armed forces in his junior year 
because, he said, he would learn to "take and give orders and to 
learn to think in a dutch.” When he was a senior he had changed 
his mind. At that time he indicated he would not enlist. "It’s long 
enough when you’re drafted,” he said. During the first half of 
his senior year he seemed very uncertain about the future. He 
showed some interest in going to college to study accounting, and 
he wanted to take some tests to determine whether or not he had 
the aptitude for it. The pros and cons of several occupations were 
discussed when he said that he wanted to learn more about them. 
In the last month of his senior year, however, he said that he 
would take a job in a factory. No one predicted that Alike would 
set up a plan to work his way through college and to carry it 
through in the manner described by his mother in the first para- 
graph of this case study, 

a. Comment on Mike's two Henmon-Nelson scores, i.e., 58th 
percentile and 75th percentile. Why the disparity between 
them? 

b. What personal factors might account for tlie difficulties in 
prediction for the future with Mike? 

c Suppose Mike had come to you as his counselor in his 12th 
year of school and asked your opinion about his chances for 
success in a university. What could you have told him? 


R£FER£NCES 

Bcrdie, Ralph. A//er H/g/j School, What? Alinneapolis: University of 
Minnesota Press, 1954. 



280 


Measurement for Guidance 


Caplow, Theodore. The Sociology of W'ork. Minneapolis: University 
of Minnesota Press, 1954. 

Carter, Ralph S. "How Valid Arc Macks Assigned by Tcacliers?" 

Journal of Educational Psychology, April, 1952, 43:218-228. 
Eells, Kenneth, and Others. Intelligence and Cultural Differences: A 
Study of Cultural Learning and Problem Solving. Chicago: Uni- 
versity of Chicago Press, 1951. 

Guilford, J. P. Fundamental Statistics in Psychology and Education. 
New York: McGraw-Hill, 1956. 

Gustad, John W. ’ Test Information and Learning in tlic Counseling 
Process. Educational and Psychological Measurement, Winter, 
1951. 11:788-795. 

Heimann, Robert A., and Schenk, Quentin. "Relations of Social Class 
and Sex Differences to High School Achievement." School Review, 
July, 1952, 49:213-221, 

Hollingshead, August. Elmlown’s Youth. New York: Wiley, 1949. 
Hollingshead, Byton. mo Should Go to College? New York; 
Columbia University Press, 1952. 

Hull, Clark. Aptitude Testing. Yonkers, N.Y.: World Book. 1928. 
Lcon,rd W. N. "Psychological Tests sod the Eduoitional System." 
School and Society, April 12, 1952, 75:225-229. 

Substantial is a Substantial Validity 
^efficient. Personnel and Guidance Journal, February, 1956, 
34:^4n— a44 ' 


Meehl Paul. a™Vs; „„„ SuMul Pr.dktim. Mimseapolis: Uni- 
versity of Minnesota Press, I954, 

Rolhney, John W. M. "Evaluating and Repotting PupU Ptogtess." 

3. Nfw Yotlc: 

Thorndilce, Robert h. S.Ualo„. New Yotlc: Hatpet, 1949. 



281 


Combining Test Scores with Other Data 

Warner, W. Lloyd, Havighurs^ Robert L., and Loeb, Martin B. 

Who Shall be Educated? New York: Harper, 1 944. 

Wesman, Alexander. "Better Than Chance.” Test Service Bulletin. 

New York: The Psychological Corporation, 1953, 12 pp. 

Wolfle, Dale L. Americas Resources of Specialized Talent. Report of 
the Commission on Human Resources and Advance Training. New 
York: Harper, 1954. 



CHAPTER VMI 

Personality Questionnaires and 
interest Inventories 


^ The terms "inveatories** and "questionnaires” are used in the 
title o£ this chapter to emphasiae the fact that there are no interest 
or personality leus in any sense of that word. Although there are 
mstrummts that bear such titles' and the terms "personality" and 
interest” tests are commonly used by educators, personnel workers, 
and psychologists, there is really no justification for their use. These 
mstmmenls are either questionnaires such as the Kuder Preference 
R^r^d the Strong Vocajronal Intere st ^ or controlkdT 
temews such as the mdividual form of "the Minnesota Multi- 
P^ic Inventory and the Rorschach.' As questionjah^ a^S^ 

See Ir “ “““-‘o- -ogniae the, 

difference between mventories and tests and that they stress this] 

Tbtmaijt ^ ''""’“‘diir, CalifomU Test Bureau. Tbe 

«ve,n MeS“e" 

TMSity Press md Gryphon Press. ^ published by Rutgen Uni- 




Personality Questionnoires 283 

difference in their work with counselees, to prevent the common 
tendency to misinterpret the scores. 

Counselees who have heard about "scientific" tests of aptitude in- 
sist on reading into questionnaires and inventory scores something 
that was never intended by most of their authors and for which 
there is absolutely no justification. They tend to think that they 
have taken a vocational fitness test or a measure of aptitude for 
some educational experience. Since these instruments use formats 
similar to tests, and since they have scores and norms, they encour- 
age this kind of misinterpretation. It is most unfortunate that au- 
thors and publishers of inventories or personality questionnaires 
have not taken enough action to prevent the misuse and misunder- 
standing that are common among those who take them, and even 
among those who administer them. 

There is a great deal of evidence that pe rsonality questi o nnaire s, 
controlled interviews, and interest inv«itoriesjare„ widely, used in 
coui^lingTjust why this should be so in view of the demonstrated 
in^equacies of these devices is difficult to understand. It seems 
that it must be a combination of amazing, psychometric innocence 
o ^he p art of the users, nwyete in_ronsidering, the counseling job 
as a "quickie” affair rather than a complex longi_tudinal problem, 
mistake falffTin^tatStics on the part of inventory producers and , 
consumers, expediem^ and a desire to keep up with the other 
fellow who uses them for any of the above reasons. Perhaps 
another reason for their popularity can be found in.,the-seeming- 
exactness they give to the counselor’s work. Counseling interviews 
.m?/’ Jinarr jjalsvAificaU/' Awayib J;!? 

colleagues or clients, but an arr ay of scores seemingly supported by 
pedanti c jargon might possibly do so. The popularity of the instru- 
ments may be due in part, then, to the psychological support that 
counselors, working in a relatively new area, and without adequate 

• See, /or exaropJe, Ralph F. Bexdie, 'The SUte-Wide Testing Programs.” Personnel 
and Guidance Journal, April, 1954, 32:454-459; J- R. Berkshire and Others, 
■'Test Pre/eience in Guidance Centers." Otcupaiiont, March, 1948, 26:337-343. 



CHAPTER VIII 


Personality Questionnaires and 
Interest Inventories 


The teems "inventoties” and "questionnaires” arc used in the 
title o£ this chapter to emphasize the fact that there are no interest 
or personality testt in any sense of that word. AJdtough there are 
instruments that beat such titles * and the terms "personality” and 
“interest” tests are commonly used by cduators, personnel workers, 
and psychologists, th«e is seally no justification for their use. These 
instruments are cither questionnaires such as the Kuder Preference 
R ecord an d the Strong Voca^iorul Interest Blank or controlled 
interviews such as the individual fonh of the Minn^ta,Mulb-_, 
phi^sic Inventory and the Rorschach.* As questionnaires and inter-/ 
views they possess all the linutalions as well as some of the 
advantages of such techniques but they should not be described or\ 
thought of as tests. It is essential that counselors recognize the \ 
difference between inventories and tests and that they stress thisj 

^See, for «»inple, 'thtCdiferBh'TMofPmonaliij, Caljfofni* Test Bureau. Tl’r 
thtmaiie Appercepnon Ttst, Hanaid Utuvenu; Piejj. Tht Rdieri Tut ef Ptf 
toadiiy AjSjKimtnl, Tlie Psychological Corpotalioo. 

® There are more than one hundred of such initrunwQts. See iho&e listed ia the 
aeveral Miald Meajaremeals Yearboakt by O. K. Buros, published by Rutger* Uni- 
Tersitf Press and Gryphon Pres*. 


Z82 



Personality Questionnaires 283 

difference in their work with ojiinseiees, to prevent the common 
tendency to misinterpret the scores. 

Counselees who have heard about “scientific” tests o£ aptitude in- 
sist on reading into questionnaires and inventory scores something 
that was never intended by most of their authors and for which 
there is absolutely no justification. They tend to think that they 
have taken a vocational fitness test or a measure of aptitude for 
some educational experience. Since these instruments use formats 
similar to tests, and since they have scores and norms, they encour- 
age this kind of misinterpretation. It is most unfortunate that au- 
thors and publishers of inventories or personality questionnaires 
have not taken enough action to prevent the misuse and misunder- 
standing that are common among those who take them, and even 
among those who administer them. 

There is a great deal of evidence that pe rsonality ques ti onnaire s, 
controlled interviews, and interest inv^toriw^are widely used in 
counseling.’ Just why this should be so in view of the demonstrated 
Inadequacies of these devices is difficult to understand. It seems 
that it must be a combinatioa of amazing, psychometric innocence 
on the part of the users, n^yete in considering the counseling job 
as a “quickie” affair rather than a complex longitudinal problem, 
mistakS’fal'tK'Iirstatistics on the part of inventory producers and 
consumers, expMiScy, and a desire to keep up with the othw 
fellow who uses them for any of the above teasoas. Perhaps 
another reason for their popularity can be found m the. seeming 
^actness they gi ve to^ the counselor’s work. Counseling interviews 
may seem not scientifically respectable enough to impress one’s 
colleagues or clients, but an ar ray of s cores seemingly supported by 
p edant ic jargon might possibly do so. The popularity of the instru- 
ments may be due in part, then, to the psychological support that 
counselors, working in a relatively new area, and without adequate 

* S«v for example, KxJpb P. Berdie, “The State- Wide Testing Programs.” Persouftl 
and Guidance Journal, April, 1934, 32;454— 459; and J. R. Berkshire and Others, 
"Test Preference in Guidance Centers.” Occupations, March, 1948, 26:337--343. 



2g4 Measuremenf-for Goidance 

evidence of their effectiveness, may feel that they need; and the 
round-the-clock huckslerism in the sales of the instruments must 
account in large measure for their widespread use. Certainly it 
cannot be justified on the basis of logical reasoning or experimental 
evidence. 


limitations of SHORT-CUT METHODS 

Various writers have demoftstrated that all the instruments for 
measuring personality and interest have such serious limitations 
that their use seems liVely to do more harm than good, .Among 
those limitations the following have been noted. 

1. Scores can be faked deliberately or without awareness. 

2. Titles of the instruments are simply christenings by thdr 
authors, 

5. The vocabulary used is a source of confusion to subjects who 
take the instruments. 

4. Many of the instruments force the subjects to make choices 
among items about which they have neither knowledge nor concern. 
They may also require choices among items of unequal familiarity. 

5. Use of particular scoring methods in order to get so<alIed 
" objectivit y- limits the subjects’ expression of enthusiasm or con- 
cern. 

6. Statistical methods used in construction and normlng of the 
instruments axe questionable. 

7. Results ate subject to misinterpretation by those who take the 
instruments. 

8. Evidence of the ^^ipe_viVidi\y of the instruments is either 
nonexistent or ques tionable . 

9. They rely on sclf-estiirmes Imown to be highly invalid, 

10. The cultural background of &e subject is not given adequate 
consideration. 

11. They suggest stability of proon^ty and hence encourage- 



Personality Questionnaires 285 

roent^o^ Counseling o^subj^s as they are without consideration of 
what they may become. 

12 . In extreme cases of high interest or disturbed personality 
the y^W 4 y sjmply^elab orate the obvious . 

13. They discourage experimentation because they seem to 
provide a large quantity of numerical scores for rapid calculation. 

AHEWPTS AT JUSTIFICATION OF SHORT-CUT METHODS 


Before consideration is given to each of the limitations listed 
above it should be noted that interest and personality testers rely 
I frequently on a superficial kind of reasoning that sometimes seems 
kto justify the use of their instruments. It usually runs something 
/like this; "Hyou are going to be _successfuLm any.undertakingyou 
must haygjnterestin.it or the personality for it, as well as ability to 
do the required tasks. It will be necessary then to inventory your 
interests and personal traits to see if you have them. If your inven- 
tory and ability scores are high then you are likely to succeed in the 
tasks you are about to undertake." From this kind of statement they 
then go on to administer the instruments and counsel on the basis 
of the results. Typical of such statements is the following. 


The fundamental reason why general clinical counselors and re- 
searchers in personnel work have devoted increasing attention to 
" interes t'!. Jies in widespread common observations of workers in 
different fields. Such observation reveals simply that, granted the 
presence of abilities commensurate with the demands of the job, 
workers are successful, happy in, and satisfied with their jobs if they 
feel at home and at ease with those with whom they have to work. 
This feeling of at-homeness arises from having strong and large 
areas_of common interests which spread far beyond the borders of 
omthe-job behavior. ... If, as observers, we move suddenly from 
association with preachers to hobnc^, say with horse-racing stable- 
men, liquor salesmen, or ballet dancers, we are struck at once by the 
marked difference in the interests of each group. We may readily and 



286 Meosuretnenf for Guidance 

quite accurately conclude that a successful -n’orker in one of these 
( occupational sets would feel like a fish out of water if he attempted 
Ito catty on the job of another. A preacher, for example, might have 
^ all the fine build, muscular coordination, sense of rhythm, and tim- 
ing of a potentially great b allet dancer , but his interest would wholly 
inhibit him from trying to be or succe^ing in becoming one.* {Italics 
added.) 


From this kind of illustration in whidi extremes of interests in 
occupations not covered by interest inventories are used for illustra- 
tion, and in which such impossible situations as comparing a per- 
son's feelings to a "fish out of water*’ are used, the authors in this 
typical comment jump to the clincher in these words: "One of lhe\ 
most important functions of the counselor in educational or in-l 
dustiial practice, theiefoie, is helping of individuals to match their 1 
aptitude and ability patterns with their interest patterns." 

From such statements comes the attempted justification of having 
students respond to hundreds of Items by throwing a circle around 
an L, an 1, or a D to indicate liking of, indifierence to, or dislike 
for such items as the names of occupations, school subjects, amuse- 
ments, activities, peculiarities of people, listing of preference for 
activities, comparisons of two activities, and self-rating of charac- 
teristics. The scores are then summed and the total is interpreted to 
indicate the similarity of interest of the client with those who have 
been successful in the occupations indicated, but not for preachers, 
horse-racing stablemen, liquor salesmen, or ballet dancers. TylecJ 
an d Hah n and MacLean* have pomtedjout Aat one of the most 
commonly used instruments, The Strong Vocation^ Interest Blank 
is not appropriate for investigating' interests of youtTmu^ below 
t he age of 17 ,^and.ts not as useful in the high school as in the col- 
^ lege_situation. Its range of ooupational keys outside the profes- 


‘^filton ^ and M. S, &tacl«an. Cgurtstlini Psychology. New York- 

McGriw-Hill Book Co., 19 ) 3 , p 19^ 

tofa'wH J’iji ‘'ppi'io-coaur- 



Personality Questionnaires 287 

sional fields is so limited that it is not informative for the imjoiity 
of students who will not go into professions. 

The long jump from the generalizations from “common observa- 
tion” to elaborate profiles indicating various degrees of so-called 
vocational interest or personality adjustment is made quickly. It is 
implied, despite the limitations noted above, that the inventory or 
ques ttonnaire makes the jump possible and that counseling can be' 
b ased in part upon them. Aldiough careful writers surroundTHeir 
statements, about the value of such devices with cautions about their 
ex clusive use many still cannot resist such statements as these. 

If pressure of time forces us to use a single method, most counse- 
lors prefer to depend upon a well-standardized measurement because 
of its greater demonstrated reliability and validity.^ 

It may be generalized that some caution is required in basing voca- 
tional decision upon interest test results during the earlyjj/gh sAqoI 
years, and Jhat-cepeated measurements of interests is desirable /for 
ade q u^e gu idanc e procedures at this age level.* ( 

Anyone who is in a positton_t o_take a personaJityJnventory_c?n,jf 
he so desires, honestly and thus get a reasonably 

adequate for personality improvement.* s.. 

In the case of personality traits or mental health, use has been made 
of the judgments of individuals who are intimately acquainted with 
the Subject being tested. However, the responsibility for such judg- 
ments has been such as to render them well-nigh useless . . . care- 
fully constructed personality inventories thus are probably in some , 
instances more valid than criteria with which they are correlated.’* 

Projective methods are peculiarly appropriate to schools because 
you can give them without excitin g the subiect _and arousing emo- 
tional resistance or disturbance. . . . They can be used for some of 

^ Hahn and MacLean. JM,, p. 2J0, 

*B, Jacobs. “Stability of Interests at tbe Secondary School Level." Educational 
Records Bureau Bulletin, July, 1949, No. 52, pp. 8>-87. 

» L. P. Thorpe. The Psychology of Messidl Heahh. New York: The Ronald Press 
Co., 1950. p. 252. 

L. P. Thorpe. Ibid., p. 648. 



288 Measurement for Guidance 

the perplexing disciplinary problems; namely to find kleptomaniacs, 
perverts, liars, those who write obscene notes or circulate obscene 
drawings because you can locate such individuals. They will reveal 
themselves in various ways, as in lheT.A.T-, for example. They can t 
help but tell what is going on inside, as has frequently been shown.” 

The high school or college teacher who is attempting to help his 
students in the choice of fubire occupations may find either or both 
of these tests (Kudet and Strong) useful.” 

aear-cut interest types seem to exist, and can be isolated on this 
/test as early as the tenth grade. The interest types have definite 
personality correlates, somewhat indicative of the statement that birds 
of a feather flock together. Over a period of time there is considerable 
stability of measured interests.” 

'^hen an individual pupil is being counseled with regard to col- 
lege preparation, if evidence of high aptitude in a certain field, let us 
say science, is supported by objective evidence of interest in the same 
area, the pupil may be advised with greater confidence to consider 
majoring in this area in college, for it is known that interest in a 
field inaeases the likelihood of success in that field,” 


OBJECTIONS TO SHORT-CUT METHODS 


Many workers in the field of education, psychology, and guid- 
ance have voiced opinions that are completely in contrast to those 
cited above. Samples of sudi statements follow. 


It is certain that the matching of internal questionnaire factors and 
external behavior factors is systematically more difficult than person- 


» Lawrence K. Frank. •'UndrrsUiidM^ the Individual Throu^ Projective Teth- 
tuques.” in Arthur E. Trailer (Ed.). Goals of American Eiueelioa. American Council 
-,i.nr, e,. Ko. 40. Washii^wa, D.C; American Council on Education. 


I. Psjchologf and Teaching. New York: Scott- 


on Education Studies 
April, 1950, p. 61. 

W. C Morse and G. M. V 
Fofesman & Co , 19J5, p. 402. 

■■Jota 0. DuW. ■T.,0 joa Pramml W„k io Coll.g.," N.w 
Mm.min, .,d Areno.; ^ EJ.uUm Stuio No. 20 wib- 

in^ton, D.C: American Council on Educatioa, August, 1944 



Personality puesfionnaires 289 

nei workers imagine. . . . llje self -inventory represents the nadir of 
scientific invention and subtlety.** 

An overwhelming ma jority of publications presuming to be validity 
studies (of structured tests of personality) reported significant find- 
ings. It was difficult, however, to get a clear picture of the validity of 
the inventories. Frequently investigators ran a whole series of signifi- 
cant tests involving one or more inventory scales and, finding that 
one or two out of 20 differences or relationships reached an accepta- 
ble level of probability, claimed some validity for the inventory. 
What they apparently failed to realize is that such a series of signifi- 
cant tests is also subject to sampling considerations, and that the 
relatively few successful instances may have arisen pureiy by chance. 
In any case the need for cross-validation and replication is great.’* 

The scientifically minded research worker who reads a sample of 
the studies reported above {on projective techniques] is likely to feel 
at least mildly disturbed. He will want to point out that research 
workers with projective techniques often omit control groups or cases, 
leave out important descriptions of their subjects and methods, do 
not use enough subjects, employ faulty designs, pay little attention to 
norms, continue to employ concepts that research has shown to be 
faulty, interpret the product of "test plus user” effect as a product of 
the test itself, describe criteria vaguely, present tentative reports as 
if they were final, and oftea introduce irrelevant information into 
their studies.” 

It would be helpful if interest inventories had built into them or, 
mote realistically, if psychologists had available for coordinate use, 
injures which would differentiate the interest profile which ade- 
quately reflects an adequate self-concept from the interest profile 

** Raymond B. Cattell. Dtsetiption and Measurement of Pefsenalhy. Vonkers, 
N.Y,; World Book Co., 1946, pp. 341-342. 

** Edward J. Fufst and Bttino G. Pndee. "Dcvehptwnt and Applications of 
Structured Tests of Personality." Revievt of Educational Research, February. 1955, 
26;26-55. p. 27. 

** John W. M. Kothney and Rt^rt A. Hetouan. "Development and Applications 
of Projective Techniques.” Ret/ietp of Educaisonat Research, February, 1956, 25:55- 
71, p. 66. 



290 Meosuremenf for Guidance 

Xhidi Is the result of inadcr)uate perception cither of self or the 
i^otld of work.“ 

VALIDITY OF INTEREST INVENTORIES 


Now that the reader has seen samples of contrasting opinions 
on the value of interest inventories, personality questionnaires, and 
structured or projective interviews for assessment of interest and 
personality, he may want to consider evidence about their utility. 
The counselor will look first at the evidence of validity and, for 
his purposes, particularly at data on predictive validity. It may 
come as a shock to find that, althou^ the instruments arc widely 
used b guidance, the evidence of validity is scant. 

It is common practice to report as evidence of validity of interest 
inventories the finding that some scores, say of nurses, are slightly 
higher than others on social service scales, or that accountants score 
significantly higher on a computational scale than persons who are 
not in that occupation. The counselor will not be impressed with 
such data. He wiU tend to dismiss the slight relationships repotted 
as elaborations of the obvious, since persons who do not like com' 
putation ate unlikely to enter an occupation requiring it if they can 
resist pressures or know enough about it.” He will note that, even 
if there are sigruficant di^erences between groups in and out of an 
activity, there can be much overlap between their scores and that 
his particular counselcc might be one of those cases in the area 
undei the curves of distribution in whidj there is no discrimination 
between groups. He will look for evidence that inventory scores 
obtained by expenditure of much money and time will tell him 


Donald E. Super. "The Mownreoieat of fateiesU." /oarsal of Ctiunulht Pjr. 
cbology, Fall, 1954, 1:171-179. ‘ ' 

has -been found, for inucnpk, that afaoul half of a sample of uaiversitr stu- 
*nts do not show ele«.cut of interests. See: John G. Darley. ClJmcaJ 

dfpecit and Inierptelalicns ef the Sijoag Votaiional Uittesi Blank Vor^ 

^e Psychological Co^iatio^ 1941. pp. 19 - 20 . See also: Edward A. Lincoln. "The 
Instgmficance of Significant DiSeiences" of Bxpirimenlal 'Education March 



292 Measurement for GuWonca 

counselor of high school students or college freshmen. The in- 
terpretation of such data to counselees is an almost impossible task. 

If the counselee is told that persons who have completed such long 
and costly training for, say, accoimwncy, and have been successful 
in their work, tend to mark an inventory in much the same way 
that they did while they were in training, is he to conclude that the 
inventory is valid for high school students who say that they want 
to go into such occupations? Do not the findings just suggest that 
specialized training tends to produce less flexible persons? Does it 
not then seem possible that students may develop the suitable 
interest patterns by taking the specialized training? Docs evidence 
that persons who score higher on the insurance salesman score than 
others indicate anything other than the fact that some good sales- 
men of insurance are also good sellers of themselves on the inven- 
tories? And, of course, since individuals are irreversible it cannot 
be shown that they might not, after all, have been better social 
studies teachers than insurance salesmen. 

It is often argued that performances and choices of recent gradu- 
ates provide no criteria of whether the students have chosen to 
enter fields of their interests. If this is so, authors of inventories 
and questionnaires should state this very clearly in their manuals, 
and counselors should pass the information on to their clients. The 
counselor’s job will then become more complex. He will have to 
make it clear to counselees that these scores have little to do with 
the next few years but will be important several years later. Those 
who kr\ow high school students well will find the suggestion amus- 
ing. College students will, of course, be plannmg further ahead 
than random high school students. 

It seems' from the above that the validity of the inventory on 
which the most follow-up daU are available is, to say the least 
questionable. The manuals of the various inventories and the many 
short-term studies in the literature do not cleat up the doubts that 
one must have about the validity of such instruments.®* 

“ S« the numbers of the Ww o/ Educ^ion^l entitled "Edocational 



PersonoUty 9uestiom)aires 293 

REASONS FOR LACK OF VALIDITY 

One of the outstanding weaknesses of the inventories, apparent 
to anyone who will look and fully established by research,** is the 
fact that their forecasting efficiency is severely limited by the faking 
of responses. To those who claim that a subject may not fake re- 
sponses when good rapport has been achieved, it should be pointed 
out that there is no way to tell whether or not he has faked his 
answers. Every score must be questioned regardless of the condi- 
tions under which it has been obtained. 

Ail faking may not, of course, be deliberate. The subject may not 
be aware that he is cooking the results while he is actually in the 
process of doing so. Consider, for example, the following case; 

Jim, a sophomore in college, had become engaged to the 
daughter of a wealthy business man who had promised to set 
him up in the business after the marriage. Jim had some qualms 
about becoming dependent on his prospective father-in-law and 
decided to make a career for himself in psychology. At this 
period he made a high score for psychologist on a well-known 
interest-measuring device. After a year of psychology, during 
which his grades were low and the attraction of a good business 
opening was having less influence on his conscience, he decided 
to accept his father-in-law’s offer. He took the interest-measuring 
device again, scored high this time in business and found in his 
score a satisfactory justification for his decision. After all, the 
tests showed that he was interested in business. 

and PifchologicaJ Tesfing” appearing at raiious intervals from June, 1932, to 
February, 1956, and the AnauaJ Reviews of Psyehotogj for summaries of and biblio- 
grsrp/ii'c reference to such studies. 

See: Arthur L. Benton and S. I. Komhauser. "A Study of Score Fafcing on a 
Medical Interest Test." Journal oj fire AssotiaSton of American Medreal Colleges, 
February, 1948, 57-60. Edward S. Bosdia "A theory of Vocational Interests as 
Dynamic Phenomena." Educasional and PsyeMogical Measurement, Spring, 1945. 
3:49-65. Donald S. Patterson. "Vocational Interests in Selection.” Occupations. 
May, 1946, 152-155. Verne Steward. "The Problem of Detecting Fudging on Voca- 
tional Tests." Personnel Reports for Sales Exetuttves, January, 1947. H. P. Longstaff. 
"PakabUitY of the Strong Vocational la^ast Blank and the Kuder Preference 
Record.” journal of Applied Psyshology, August 1948, 360-369. 



294 Measurement for Guidance 

Perhaps Jim’s case is not typical and it might even be argued 
that the scores provided some useful hypothesis about him, but the 
inventory results seem only to have elaborated the obvious since he 
had faked it to suit his stated desires. The point here is that faking 
can occur and the counselor can never be quite sure about the 
absence or presence of deliberate or unconscious faking of re- 
sponses. It is possible that results may be cooked for such reasons, 
among others, as attempts to justify acceptance or rejection of a 
stated choice, desire to please or disturb someone who will see the 
scores, attempts to beat the game for the pleasure of doing so, and 
wishful thinking. Suggestions by some authors for overcoming the 
faking of responses such as emphasizing speed, trying to encourage 
students to be honest, and viewing very high scores with suspicion 
would not be necessary if the instruments really did the job they 
are designed to do, 

But assuming that there has been no deliberate or unconscious 
faking of responses, there roust always be a question about die 
subject’s reason for marking the itcros the way he does. The fol- 
lowing anecdote told by a school counselor seems pertinent: 

A semiliterate country boy, to whom the Kuder Preference 
Record had been administered, walked into my office, test profile 
sheet in hand. A glance at the percentiles revealed that his 
highest score was in the Literary area — around the 85th per- 
centile. Not immediately taking up discussion of his test results, 

1 talked with him a bit. As judged by the caliber of his re- 
sponses, and the general level of his communicative ability, I 
marveled at the high "cxpiession of literary interest.** Finally I 
remarked upon his score and asked him if he read a great 
deal. "Naw,'’ be replied, "I don’t read nawthin* much — onct in 
a whal a detectif magazine.” When further pressed concerning 
his choice on the preferenre record, he explained that he thought 
it would be nice to be an author with nothing to do but write 
books. 

Although theie was no follow-ap in this case, the other data 



Personolil'y puesftonnaires 295 

available on this boy indicated that a literary career for him was 
highly unlikely. 

The reasons why a person indicates the preferences he does must 
always be subject to misinterpretations. To do counseling based on 
lists of responses without knowing the why of them is to perform 
at a very low level. Coimselors, if they are to be effective, must 
work at a level beyond that which is implicit in securing rapid and 
superficial estimates of abilities, aptitude, and interests. 

One of the most disturbing factors in the interest measurement 
movement is the number of counselors who do not realize that the 
names of some of the inventories or subsections of them, and the 
titles indicated by scores on their items, are simply christenings. An 
author frequently chooses to name a test, s group of items, oc a 
single item as, say, a measure of mechanical interest because he 
thinks it is. He chooses to christen another item as a measure of 
clerical interest because it appears to him that that is what it might 
he. Then, of course, after he has christened it he chooses a system 
of attaching numbers to it, and it seems to become a scientific ob- 
jective test. The internal consistency indexes that the test authors 
employ indicate only that they have been fairly consistent in the 
christening process. The reader might find it interesting to guess 
the area in which the following items, taken from inventories, are 
said to be measures of preference or interest and then look up the 
key to see how they are actually scored. The items ate: build bird 
houses; doctor horses, cattle, or bogs; care for and repair people’s 
teeth; develop a variety of pitless cherry; be an expert in cutting 
jewels; camp out; sleep in a tent oc in the open; enter the bureau of 
printing and engraving; be a psychologist. 

The surprises that one gets from such an exercise are many and 
the implications great. An author has christened an item in one 
way, but the counselor might christen it otherwise and would chal- 
lenge the author’s right to give it the name he has given. If the 
counselor were to continue the exercise and continue to challenge 
seriously the groupings of items and the names given to them, he 



294 Measurement for Guidance 

would then wander if anyone could be expected to take the scores 

seriously. _ . 

The point about the chtistening o£ items does not negate, o 
course, the previous pobt about the faking of items. There will be 
enough obvious items about which most persons can agree at least 
at a superficial level. Some of the dullest counsclecs will recognize 
that "playing a piano" is supposed to be an indication of musical 
interest and can cook that one easily. 

The problem of interpretation is complicated, too,by thevocabu- 
lary problem. Consider the item "actress" which appears in one of 
the common inventories. The student is requited to respond to this 
item by indicating Like for, dislike for, or indi/Tereoce to that word 
alone. The word covers a wide variety of performers from bur- 
lesque queen to moving picture star, to television bit performer, 
and to others. The advantages and disadvantages of being an 
actress of each of these kinds must be known to few of the re- 
sponders, the pay range roust be a question of speculation, and the 
opportunities in the profession must be known to a limited few. 
Without such knowledge the subject is to express like for, dislike 
for, or indifference to something about which she knows little. Yet 
that response is supposed to be used to help her to make one of the 
most important decisions of her life, the selection of a career. The 
argument that such items do differentiate occupational groups in 
general and may therefore be useful regardless of the subject's 
understanding of the items will be treated later. The suggestion 
that the objection to the inventories on the basis of vocabulary can 
be taken care of by providing a brief vocabulary list is ludicrous 
in its simplicity because a vocabulary list inclusive enough to in- 
corporate all the definitions, their subtle shadings, and their varia- 
bility would require weeks of study. As has been suggested, just 
one of the words, "actress ” could be studied in all its ramifications 
and details for many, many hours. It is discouraging to find that 
some authors who have pointed out the vocabulary difficulties 



Personality poeslionnoires 

involved have suggested only that thcjording be changed lather 

than that the instruments be discarded.” 

Even the vocabulary problem would not be so serious were it no 
for the fact that there is great unevenness in the knowledge o e 
various items to which a subject must respond or choose among, n 
one preference record, high school students, even junior high 
school pupils, must choose the best liked among the following 
items: (1) sort and catalogue a valuable stamp « 

write a popular article on how a diesel engme works, (3) 
mine the cost of manufacturing a new soap. 

The first may have been a subject of lifelong concern to a conn 
sel« and L may know a great deal about it. To the others h^m^ 
not have given a moments consideration btf“e f “g 
tory and he knows little about them. Since 
ha"e some Uttle knowledge about most of *e items 
will not, of course, be entirely random. He J'-' “““f 
toowledge to a^choice long the 

of the most important events m his “P“‘““ ,. ^ lukewarm 

his enthusiasm for it more forcibly than e ran aeserves 

interest. Johnson comments my tenure of 

more attention than it has receive . 7 decisive and 

punch counting I tody penetrated the paper, 

vigorous; others were hesitant and y P • ^ „ -^Tiy 

Some were bull's-eyes and others appeared on the periph ry y 

r c-wn« Interest Inventories With Respect 

-Edward C. Eoeber, "A Scpt=»b=r. 19<8, 42:8-r7. 

to Word Usage." Journal of Bdueattond Res 



298 Measurement for Guidance 

a coofiguralion o£ faint punches neat the ciidc s edge should count 
as much as those accurately made and with sufficient force to mar 
the table and blunt the pin is indeed hard for me to understand. 

It seems rather apparent that a count of perforations is a pretty 
crude measure of interest.” ** 

The counselee may want to answer, when asked if he iikes some- 
thing, “Yes! Yes! Yes!” and to another question a hesitant, ‘'Y-e-s,” 
but he must circle the L’s on the inventory in exactly the same way 
as if the differences did not exist. Research seems to show that ex- 
tended opportunities to show enthusiasms add little to scores and 
do complicate scoring, but enough research on the importance of 
strong single interests has not been done simply because enthusi- 
asms have been smothered in the multiplicity of items to which the 
person must respond. In counseling, the importance of real en- 
thusiasm cannot be ignored or given minor consideration. Devices 
that do not ptovide foe their expeessbn caiuot be effective In any 
phase of the counseling process. 

It is commonly said that, though the total scores on inventories 
may be questionable, the separate items may be of some value in 
giving leads foe interviews with counselees. It should be obvious, 
if the points given above are clear, that the statements about the 
questionable validity of the total lest score apply equally well to 
the individual items and that single items may give as false leads 
as do the total scores. Further, however, it should be pointed out 
that the shotgun approach of throwing hundreds of items at coun- 
selees roust often miss. To cover all the items that may be of par- 
ticular significance to each counselee would require the use of 
thousands of items, and the significance of the one or even a few 
that might be important to him could easily be lost in the multi- 
tude. la what interest inventory can the boy who devotes all bis 
time to the flora and fauna of his surroundings really express his 
interest? How can the boy who is set upon being a butcher or baker 

30:3?7-?)'r'“' Inventory.- Fd,ru«y. 1952 . 



Personality Questionnaires 299 

express his enthusiasm? In what inventory will the girl who wants 
more than anything else in the world to be a trainer of horses in- 
dicate her interest? These are samples, and not unusual ones, of 
the kinds of counselees who cannot express their real preferences 
and who by being required to answer all the items must seem to 
indicate interests that are foreign to them. 

Counselors who need such crutches for their interviews as re- 
sponses called from long lists of inventory items ought to consider 
seriously whether or not they are skilled enough to stay in their 
profession. When a coimselor has to turn to the general, the re- 
mote, and impersonal method of the interest questionnaire to get 
his interviews going and keep them moving he is likely to miss 
the specific and close and personal things that are really important 
to this particular counselee. He is likely to miss that essential factor 
in good interviewing, the encouragement of the counselee to reveal 
himself more fully than he can possibly do by circling symbols or 
punching holes. 

In view of the pleasure that students get in taking the inven- 
tories, one might keep them for entertainment purposes — except 
for the fact that it seems impossible to prevent students from mis- 
interpreting the results. Try as hard as a counselor will to prevent 
it, many students will believe that they have taken aptitude tests, 
employment tests, or vocational fitness tests. Having heard about 
scientific tests of aptitude and, in their ignorance, having been im- 
pressed with their value, they insist on reading into their interest 
inventory scores something that was never intended. The very na- 
ture of the inventories, their scores and norms, encourages this kind 
of misinterpretation. It is most unfortunate that inventory authors 
and publishers have not taken enough action to prevent the misuse 
and misunderstanding that are common among those who take the 
questionnaires. 

On every page, perhaps even temporarily after every item if one 
must use the inventories despite all that has been presented on the 
preceding pages, it should be indicated that the items do not meas- 



300 


Measurement for Guidance 

ure aptitude or ability. The counselor should teach his counsekes 
to keep repeating the directions for taking the inventory when they 
answer each item and those dii«aons will usually tell them that 
they should "assume that they have the training and experience 
(that may mean as much as five to ten or more years) necessary for 
all the activities." If that alone docs not inhibit them from making 
too broad interpretations, maybe the further direction that they 
are to choose, "What uioulJ you do as a regular ibwg if you were 
equally familiar With all the activities?" will do so. Even the dull- 
est counselee will realize that he cannot be equally familiar with 
all the items. 

It is probably true that many persons who use interest inventory 
scores have very little insight (though the authors of the tests may 
have) into the questionable assumptions and statistical procedures 
that are utilized in their construction, scoring, norming, and at- 
tempts to provide evidence of validity. Many of the procedures 
assume that the characteristics that ate alleged to be measured ace 
normally distributed despite the very obvious evidence that social 
pressures continuously skew such distribub'ons and local mores, 
economic factors, nepotism, and endless other factors distort them. 
Do the users of the inventories realize that the items the subjects 
do not mark contribute. In a way djcy never intended, to certain 
areas because the forced choice residuals are scored? Do they really 
agree that the individual is atomistic in nature, that the atoms ate 
independent and that they can be lumped together by simple addi- 
tion to produce a really meaningful whole? Have they forgotten 
the lessons they learned in elementary school about the need of 
having something in common among the added items to make the 
sum meaningful? Do they know what kind of curious but involved 
reasoning is implied when it is suggested that scores above the 
seventy.fifth percentile are significant? Are they aware that there 
may be great overlap between two groups of individuals whose 
means differ significantly in, say, clerical and scientific interest? Do 
they know that much of the so^aUed validity data can be pro- 



Personality puesflonnaires 30) 

duced by the responses of that relatively small proportion of stu- 
dents in a total population who have such obvious interests that no 
measurement is necessary? 

It does not seem likely that interest inventories will be easily 
routed from the scene because students do seem to enjoy hlling 
them out. One teacher of English recently indicated that she had 
no faith in them but that she would not give them up for any- 
thing, "The two periods in which the pupils take the preference 
record, score it, and plot their profiles are tw'o periods of the course 
that they like best." There is something fascinating in the popping 
noise that results when they push a pin through the stiff paper and 
It s fun, students say, to dream about being actors and astronomers 
and psychologists. Many of them have not had so much fun with 
pins and paper since they left kindergarten and it is a welcome 
change from school routine. Maybe they should not be denied the 
entertainment, but counselors ought to recognize that, until more 
convincing data about their value ate available, interest invento- 
ries do not offer much more than that Then they will have to con- 
sider whether the fun is worth the price. 

VALIDITY OF PERSONALITY APPRAISAL TECHNIQUES 

In the sections immediately above the discussion has centered 
around the interest inventories but much of what has been said is 
equally applicable to the so<aIIed tests of personality. Examina- 
tion of the literature about personality questionnaires and con- 
trolled interview techniques such as the Minnesota Multiphasic 
Inventory and the Rorschach reveals the same kinds of superficial 
reasoning noted above. A jump is made from platitudes about the 
importance of being adjusted, of being a well-balanced personality 
and being free from neurotic tendencies, to the suggestion that 
these things can be assessed in a dependable manner by responding 
to a long list of questions in lists or cards, or to ambiguous stimuli 



302 Measurement for Guidance 

in the form of pictures or ink-blots. Samples of such statements 

follow. 

Insistence on respect for the “wholeness” of the adjusting organ- 
ism, or guidance of the whole student, represents a major contribu- 
tion of the modern movement in education. This personality test is 
an implement or tool through which the teacher can more easily and 
effectively approach this desirable goal. {Manual of Directions for 
the California Test of P^rxan«j//ry— Secondary Series. Los Angeles: 
California Test Bureau, 1942, p. 1.) 

The busy professional counselor and the guidance-minded class- 
room teacher have indisputable need for assistance in evaluating the 
various facets In die personalities of their counselee’ s ability level and 
interest pattern alone will not explain all the ramifications of indi- 
vidual behavior. . , . Fully to appraise the complete individual, 
therefore, one must of necessity include the pertinent personality 
factors in the equation. The Personal Adjustment Inventory has 
been designed to assist in this process. {Manual for the Heston 
Personal Adjustment Inventory. Yonkers, N.Y.: World Book Co., 
1949, p. 1.) 

Each of us has relatively permanent personality characteristics or 
traits known as out temperament. These aspects of personality are 
important for an understanding of ways we will act in school or 
industrial situations ... we need a schedule that emphasiaes im- 
portant, stable traits which describe how normal, well-adjusted people 
differ from each other. The Thurstone Temperamental Schedule was 
devised for this purpose. . . . Seven areas of temperament are 
appraised in a relatively *oit questionnaire. {Examine? s Manual for 
the Thurstone Temperament Schedule. Chicago: Science Research 
Associates, 1950, p. 1.) 

Following such generalized statements in the manuals, there is 
usually a series of dreary statements about validity that would 
strain the credulity of anyone who has had even minimum ac- 
quaintance with that topic. One is given coefficients of correlation 
with tests that are said to have been previously validated (Manual 
for the Bemieuter PetsonaUt, Inventory) ; evidence that the inven- 



Personality Questionnaires 303 

toiy does not do what it purports to do with attempts at explana- 
tions (Manual for the Thurstooc Temperament Schedule, p, 10) ; 
low coefficients of correlation with self-ratings and those of other 
persons (Manual for the Heston Personal Adjustment Inventory, 
pp. 27, 28), and various other timeworn procedures that pronounce 
loudly to all who will listen that validity of these techniques is 
questionable. 

Perhaps the continued widespread use of these devices is due in 
Ja/ge part to the desire of many persons to get some data qiiickly. 
It has been pointed out by several writers that such devices vainly 
seek the pot of gold at the end of the rainbow — a simple, cheap, 
foolproof method for studying human personality. But fools con- 
tinue to rush in where angels fear to tread and there seems to be 
no immediate hope that the waste of time and money on such in- 
struments will soon be reduced. Though research on human per- 
sonality should not be discouraged, it seems clear that nothing 
revealed by the study of behavior suggests that there is promise in 
the continued use of the standardized questionnaire or inventory 
method.** Whether or not the counselor agrees with these latter 
statements, he will be forced to agree that there are no data on the 
predictive validity of such instruments that justify their current use 
in counseling. 


VALIOijr OF PROJECTIVE METHODS 

Discouraged by the lack of evidence on the validity of structured 
personality measurement devices, the counselor may be tempted to 
turn to projective methods (w/ tests) in the hope that he will get 
some valid measurement of personality. After he has examined the 
evidence, his discouragement may turn to complete disilJusion- 

B. Cattell states the situnioa well ia these words. "The onJy situatioa in 
which a questioaaaite would ija»e complete niidity would be a person with com- 
plete integrity and complete self-knowledge— and such a person would scarcely 
need a personality test” Raymond B, Catiell, DesctipHon and Measutement of Per- 
sonality. Yonkers, N.Y.; World Book Co., 1946, p. 343. 



304 Measurement for Guidance 

rnent.^* Attempts to validate such devices in all the usual ways 
has resulted invariably in negative results or in findings of such 
low relationships between projective scores and criteria that there 
use for diagnosis and prediction is profitless. (It it possible that 
psychiatrists and clinical psychologists can use such methods as 
aids to diagnosis in seriously disturbed cases, but counselors in 
educational institutions ate not usually clinical psychologists or 
psychiatrists.) Projective methods have been used in the study of 
such groups as obese women, blind adults, stutterers, adoptive par- 
ents, disoirdant marriage partners, children with reading disabili- 
ties, unsuccessful students, applicants for admission to various 
kinds of training, and many other groups. In most of such studies, 
which seem more like advertisements than experiments, it is diffi- 
cult to separate what has been merely claimed from that which has 
been amply demonstrated. In all cases the results offer the coun- 
selor little to help him in his duties. 

Projective methods raise all the usual difficulties b validation of 
personality measurements mentioned at other places in this vol- 
ume. The use of confusing language in reports of subjeas* per- 
formances on the projective instruments makes their interpretation 
particularly difficult. 

The following report on a student was submitted after he had 
taken the Rorschach. It has been edited only b the sense that the 
actual symbols of categories used by the clinician have been re- 
moved. 


RORSCHACH REPORT ON A COLLEGE STUDENT 


This is Ihe Rorsdiich pidure of » personality that is almost com- 
pletely constricted emotionaUy. He represses all tendency to 
respond emotionally either to the stimuli of the world outside 


.. J, 07 M. Rottao' Old R, ^ Hdraoin. "Devdopmau oid Application, of 

M KS.' Is 1°’ "f Criotiiotj FtbrniT, ml 

Id-S S,?“ ’> sWd 



Personality Questionnaires 305 

him Of to promptings from within (fantasy, creative thought). 
Probably severe emotional shocks in early conditioning left him 
afraid to respond emotionally to others, to trust them or love 
them, for fear they might hurt him or let him down. Vet his 
dependence on such exterrul values is complete for he never 
trusted himself enough to develop what inner resources he had 
into values of his own creation. 

His intelligence does not seem to be more than average (perhaps 
lower than college average) and this lack of superior intellectual 
ability makes any form of adjustment difliailt for a constricted 
person unless he is living in a group with low standards of 
intelligence and achievement. (Frequently the highly constricted 
person has superior intellectual capacities which enable him to 
attain accepted cultural values In the area of adjustment in 
society.) But for this young man, three value-areas are dosed 
ofE-— that of inner values of his own making, that of outer values 
of emotional relationships with others, and that of outer values 
of intellectual achievement. For him life is an insoluble conflict 
between his dependence for security on the achievement of con- 
ventional outer values and his inabili^ to attain them because of 
excessive distrust of himself and of others (inferiority feeling). 

He is very suspicious of others, particularly of people in posi- 
tions of authority. He is afraid of being imposed upon, afraid 
something will be put over on him, afraid above all that he will 
make a mistake, that he will be unable to understand and meet 
expectations, and that his status and security will be further 
diminished by his failure. Any unfamiliar situation where he is 
not sure of the outcome and has to take a chance is terrifying to 
him. He is especially afraid of any competitive or examination 
situation. He does not dare to depart from immediate percep- 
tions of reality, to generalize on the basb of past experience, and 
to adapt to new situations on that basis. This clinging to im- 
mediate reality is illustrated by his repeated questions during the 
Rorschach test: "Are these supposed to represent animals or 
anything?" "Were these supposed to represent something?" He 
cannot get over the idea that the blots represent something in 



i Measurement for Guidonce 

reality which he is supposed to idaitify. Any notion of select- 
ing aspects of the blots for himself and outlining forms re- 
sembling those he has seen in past experience is difficult for him 
to hold. 

He is very confused aboiU himself and why he cannot succeed in 
his endeavors as others do, but be has failed to so often that he 
has no confidence, always expects the worst, and because of his 
fears is unable to utilize what capacities and information he has. 
Compulsively driven to try to meet conventional standards, e.g., 
of academic achievement he defends himself at the same time, 
against his fear of fsuluie, by a childish sort of halfway opposi- 
tion; i.e., frustrated, he directs his aggression outward. He hasn’t 
the courage or seU<onfideocc to carry his oppositional impulses 
very far into action but is sporadically resistant and submissive — 
will contradict but smile as he does. This outward expression of 
hostility is one of the means by which he manages to keep 
function’ing in the face of Ms severe neurotic conflict. It U ptoba- 
ble that if he directed hts hostility toward himself any more than 
he does, he would become completely depressed, losing what 
stamina he has to continue hts battle. 

A second means by whidi he helps make life bearable consists in 
a thought and action pattern of escape. He dodges and looks 
sideways at every problem as a whole for fear it might be beyond 
him. Instead, in his confusion, be focuses attention oa some 
small part which he thinks he can handle safely, and thus, fre- 
quently overlooks the obvious and important, in favor of the 
obscure and inconsequential. His academic work probably re- 
flects this habit of thought, as well as a tendency to make hasty, 
inaccurate judgments in moments of defiance when he wishes to 
get dear of a trying situation. His anxiety blocks his memory 
and makes verbalizadon diffimlf. 

He is, however, intelligent and rational enough to know, most of 
the time, when he is not att^ing standards expected of him in 
both academic and sodal Ufe, but he does not know why and 
feels confused and helpless. HU occasional awareness that many 
of the satisfactions others find in life do not exist for him, leaves 



Personclity Puesiionnaires 307 

him with a sense of emptiness and futility. Although such feel- 

ings have a depressing effect* they ate healthy insofar as they 

indicate that he is not reconciled to his abnormal way of life. 

Perhaps the description given above may not be representative 
of what projectionists commonly report, but if there is a difference 
It simply emphasizes the fact that variation in reporting methods 
makes the problem of getting dependable data more acute. 

The counselor may wonder whether the above was worth the 
expenditure of approximately four hours for administration, scor- 
ing, and writing the report It is said that the student is "almost 
completely constricted" but no definition of "almost” is given. He 
is said to repress "all tendency to respond emotionally,” which 
seems to be an impossible situation. Jt is said that "probably severe 
emotional shocks in early conditioning left him afraid to respond 
emotionally to others” but that statement permits the additional 
one that probably it did not. It is said that "his intelligence does 
not seem to be more than average” and one would wonder if there 
are not better ways of estimating intelligence (undefined here) 
than by use of the Rorschach method. With the above suggestions 
for critical analysis of the report the reader may wish to continue 
his appraisal of it and try to decide whether it would be useful in 
counseling. 

The writers are in sympathy with the general projective idea 
and believe that it may be used informally and with locally devised 
materials,®^ but they find that scores derived from the standard 
instruments defy interpretation. The difficulties are described in 
the following passage. 

Projectionists do not have any common metric comparable to the 
difficulty level concept used fay achievement and intelligence testers 
on which their items can be scaled. Most of their "tests" must fae 
administered individually and the time element becomes important 

»» J. W. M. Rothney and Bert A. Roess, Ceu»ttUni ibt Indhidual Student. New 
York: Diyclea Press, 19^9, PP- 132-13-1. 



308 Meosurement for Guidance 

It may requiie about fouc hours to administer, score, analyze, inter- 
pret, and report a Rorschach. When that time is contrasted widi Uic 
unlimited number of test administrations that might be given with a 
group test, it will be seen that projective norms are harder to cst^ 
lish, and one can understand why the nundscc of subjects in projective 
research is fre<iuently small. Statisticians have not yet developed satis- 
factory techniques for treatment of the variables and relationships 
which projectionists profess to abstract from their data. And it is still 
impossible in the very nature of the projeaivc situation to untangle 
the test administrator from the score.** 

It appears, then, that validity of projective techniques has not 
yet been developed to a point at which they can contribute signifi- 
cantly to the work of the counselor even though they may be of 
value for the work of the clinical psychologist or psychiatrist. 


REUABlLlTf OP INTEREST AND PERSONALITY MEASUREMENTS 


Although the sections on validity in the manuals of interest and 
personality inventories are usually short and devoid of data, one 
commonly finds long sections on reliability. These may be impres- 
sive to many test users who are still confused about reliability and 
validity and who tend to intcipiet reliability coefficients as indi- 
cating dependability. The instruments might be more useful if the 
time, energy, and space that were devoted to computing and re- 
porting coefficients of reliability were spent in further study of 
validity. 

The usual methods of computing reliability for achievement and 
aptitude tests have been carried over directly to the fields of inter- 
est and personality measurement, but there is some reason to doubt 
that the conditions are similar enough to warrant such carry-over. 
The split-half method describal in Chapter III requires that the 
items in each half be of equal variability and oj the same equality 
as those in the other half. It is extremely doubtful that the items 


. R. A. Heuiuofl. "Development and Applkation of Pro. 

jecuvcTecha.es.- Raweu' «./ RweareA. Fefaruaiy. 1956. 26:5^71, ^ 



Personglity puestionnaires 309 

on inventories can be of the same quality to a particular individual. 
A subject may be consistent in his circling of the L (for like) after 
such items as actor, astronomer, fat men, and snakes, but it seems 
rather dangerous to assume that these have the same quality to him 
for all of the reasons given above in the section on validity of 
interest inventories. 

Since very few interest or personality inventories have two or 
more parallel forms, the method of administrating two forms and 
computing the correlation between scores on them is seldom used. 
The coefficients obtained from test-retest method with varying in- 
tervals between two administrations of the inventozies generally 
lie between .50 and .70. They are not high enough to permit accu- 
rate forecasting of an individual’s score at a later time.®* When 
one finds studies usmg this method it would seem well to remem- 
ber, in interpreting them, that they report consistency of scores, 
not necessarily stability of interest or personality. 

The test-retest method produces what some persons have called 
a coe^cient of stability. It has Jong been recognized that memory 
of items and responses and general "test-wiseness’' in the test-retest 
method produces higher coefficients than are obtained by other 
methods. In personality and interest inventories it seems likely that 
the memory factor would be more influential than in achievement 
testing because many of the items may be more emotionally 
charged. There is always, too, the problem of variability in moods 
from time to time. 

Use of the test-retest method in the study of reliability of inter- 
est tests may be as useless as trying to determine the reliability of a 
thermometer by checking the readings made at one hour against 
those made at a later hour during a day. The answers to questions 

** See such studies as the following- R. Jacobs, "Stability of Interests at the High 
School Level." Educational Records Bullettn. No. 52, 1949, pp. 83-87. W, K. 
Trinkans. "The Permanence of Vocational Intetest of College Freshmen." Educa- 
tional and Psychological Measurement, Waiter, 1954, l4-.64l-G46. W. L. Lyton, 
''TTie Variability of Individuals Scores Upon Successive Testings on the Minnesota 
Multiphasic Personality Inventory." EJuCttemi/aJ and Psychologies! Mes’uieuieiii, 
Winter, 1954, 14:654~640. 



310 Measurement for Guidance 

on m invcntoty about fueling of belongingness on a day when 
family circumstances have been ideal foe a youth may change sig- 
nificantly if he happens to be questioned the day after a family 
quairel has resulted ftom his insistence upon boiiotving the family 
car. 

All the above considerations and those previously reported m 
Chapter HI must raise some doubt in a counselor’s mind about the 
meaning and value o£ the so<alIed coefficients of reliability that he 
finds in manuals for interest inventories and questionnaires. It 
seems unlikely in view of the lack of validity data that he would 
ever use these instruments, but, if he should be tempted to do so, 
the absence of clarity about stability of scores should inhibit him 
completely, 

NORMS OF PERSONALITY AND INTBREST INVENTORIES 

In the pages above, the data on validity and reliability of inven- 
tories and quesdonoaices have been examined and have been found 
inadequate. When further questions are raised about their norms, 
chaos is added to confusion. 

Some authors of inventories and questionnaires have carefully 
avoided the presentation of norms but others present elaborate 
norm tables. Still others, while they do not use the term, imply that 
there are norms by suggesting that one can get an A, B, or C by 
marking an inventory in a manner similar to the way that members 
of an inadequately defined group mack them. 

Daring the early stages of the gaWauce movement, the process 
of vocational counseling vras tiesetibed as one of putting the square 
peg in the square hole. It was suggested that certain occupations 
required certain hinds of personaUttes and capacities and that the 
job of the counselor was to find the persons with suitable patterns 
of both, and to steer them into suitable occupations. This kind of 
thinking is retained by some authors and users of inventories who 
seem bent on obtaining measuremetiU of the individual’s interest 



Personality Qaestionnaires 311 

patterns or personality traits and directing’ the individual (subtly, 
of course) into an occupation or into training for it. To such 
persons an array of scores obtained from an inventory, a <^uestion- 
naire, or a controlled interview is necessary. They want a score 
that is quickly obtained and readily transmitted into a letter grade, 
percentile rank, or other converted score so that it can be used for 
educational vocational guidance or personality therapy. 

It should be clear to the counselor that there are no satisfactory 
measures of vocational interests (the name, "Strong Vocational 
Interest Blank," for example, is simply a christening by its author), 
preferences, or personality characteristics of any occupational or 
educational group. And it should follow very clearly that, until 
such measures are available, the norms provided with personality 
and interest questionnaires are useless for counseling purposes. The 
elaborate profiles that are produced from the scores are socUlly, 
psychologically, and mathematically unsatisfactory. Tyler has ex- 
pressed this very well in the following paragraph, although it is 
unfortunate that she uses the word "lest” when she means inven- 
tory or questionnaire. 

It seems to be coramooly believed by vocational counselors that 
personality tests can be used to ascertain whether an individual has 
the special traits required by a certain occupation. Is this boy domi- 
nant enough to be a good salesnun? Is this girl well adjusted enough 
to be a sodal wotker.^ Unfortunately there is almost no research evi- 
dence that warrants our using personality tests in this way. There is, 
on the other hand, a considerable body of evidence showing that such 
use is unwarranted. There are two things lacking. In the first place 
we do not know what the personality traits essential to the various 
occupations are. In the second place we cannot be sure that the in- 
ventories measure what we think they do. This means that when a 
counselor tells a client, "This test shows you are a dominant person 
who should be able to succeed as a salesman,*' it is much worse than 
telling him nothing at all and leaving him to make a decision on 
other grounds. 


312 Meosurement for Guidance 

7he d/sadvanlagei of bringing them (peisonality measurements) 
in may well outweigh any possible advantages. The essence of this 
kind of counseling is perfect candor; the counselor does not withhold 
information from the client. But what is one to tell a person who 
makes T-scores of 70 or higher on the Hs, D, and Pt scales of the 
Minnesota Multiphasic Inventory? Even if it were good therapeutic 
practice to state that he showed an unusually high level of the per- 
sonality characteristics found in hypochondriacs, depressives, and 
obsessive-compulsive neurotics — and it cdsviously is not what any 
therapist woidd approve of — there is not evert sufficient scientife 
evidence to warrant such a statement. A better procedure would be to 
phrase the meaning of the scores in simple, nontechnical language 
and say, "This test shows that you are more likely than the average 
person to be concerned about symptoms of possible illness, to feel 
discouraged and depressed at dmes^ and to get ideas and impulses 
that you can't shake oS," and then use this statement as a starting 
point for further discussion. The value of the test results to him, how- 
ever, is stiU very doubtful. If the interviews encourage free expres- 
sion the symptoms that are complicating bis life will come up for 
consideration eventually anyway. All that the test has accomplished 
by bringing them out in the beginning in this form is to add to the 
load of anxiety he is arrying.” {Italics added . ) 

As indicated in the above quotation, it would be difficult to in- 
terpret norms even if it were possible to have some that were based 
on sufficient numbers of well-described subjects. The difficulty is 
compounded when the subjects ate iiuufficient in number and in- 
adequately described in terms of such factors as age, sex, education, 
religion, social class, rural or urban residence, geographic location, 
intelligence test scores, occupation, marital status, health, and the 
scores of other factors that may determine responses on interest and 
personality assessment devices. The tendency to use adults confined 
to mental ^dtutions as basic groups, against which to compare 
controlled interview or questionnaire responses of young persons 



Personality 9 u^s^>onnaires 313 

who are not, has encouraged some weird interpretations of scores 
and some strange distortions of terms that have been previously 
attached to severe maiadjustments. The complex problem of de- 
veloping suitable norms in personality and interest measurement 
seems not likely to be solved by the use of captive audiences (one 
author has described a large stgpieat of them as an "atypical 
tninotity of studious, docile, and mtelligent humanity which sits in 
university classrooms”) as basic groups whose scores are to be 
used in comparisons with those who are free. 

In a report on norms in general the following statement appears; 

Unfortunately, many alleged norms reported in test manuals are 
not backed by even an honest effort to secure representative samples 
of people in general. Even tens or hundreds of thousands of cases can 
fall woefully short of deffniog people-in-general. Inspection of test 
manuals will show (or would show if information about the norms 
were given completely) that many such massed norms are merely 
collections of all the scores that opportunity has permitted the author 
or publisher to gather easily. Lumping together all the samples 
secured more by chance than by plan makes for impressively large 
numbers; but while seeming to simpEfy interpretation, the norms 
may dim or actually distort the counseling, employment, or diagnostic 
significincc of a score. 

If this statement is true about the relatively simple measurement 
of achievement and so-called aptitude, and it seems to be, the 
counselor should consider how much more dimming and distortion 
the inadequate norms for the personality questionnaires and interest 
inventories produce, 

SUGGESTIONS FOR THE COUNSELOR 

In view of the many limitations of attempts to measure interest 
and personality, it would seem well for the counselor to eschew 
them completely. Indeed, it would seem desirable to have authors 

Harold G. Seashore and James H. R«i^ Jr. yNorms Must Be Releraat," Tr// 
Service Bulletin No. 39. New York: The Pqfcfaotogical Corporation, 1950, pp. t6-l7. 



3^4 Measurement .fof Guidance 

and publishers agree to declare a moratorium on their production 
for, say, a 20-year period or until the time that researchers or 
phUosophers or both could, by concerted efforts, develop more 
satisfactory bstrumenU. (Just how some counselors would spend 
their time and how journals would keep up their publications with- 
out some scores to manipulate and report would present new prob- 
lems.) Measurement in any field must always follow a long period 
of description and it appears that we are only in the descriptive 
phases of the study of persotialiQr and interests. 

■While counselors, are waiting for better mstrumcirts it would 
seem desirable to heed these words of one author: . . Those 

who have real professional training will not need a system. Those 
who lack psychological knowledge will help pupils more effectively 
by using simple human warmth and interest than by thumbing a 
handbook of oversimplified recipes.” ** 

Counselors should consider, too, the words of still another writer 
presented below. 

' It is imperative that we come to understand some of the forces that 
are driving children to act the way they do. It seems to be an absolute 
necessity for us to try to tlunk that learning is not only positive in the 
sense that one can facilitate it by understanding the laws of learning, 
but learning can often be facilitated greatly by removing things that 
can be thought of as blocks to the process. 

Somehow or other, throu^ this kind of working with children 
(informal methods) they do feel a release; sometimes the greatest 
form of progress is made, not by banging away at the inteilectu^ 
task, but by taking one's time at the emotional task and coming some- 
how or other into rapport with these children. Under these circum- 
stances they come to feel they are respected, wanted, and loved, and 
that they are missed if they are absent They come to have self-respect 
and, hence, develop respert for others. They release many talents 
which have been hidden. They ^x>w! ** 





Personality Puestionnaires 315 

The statement, and the article from which it is drawn, are not 
just sentimental statements but hard-beaded recognition of the fact 
that assessment of interests and treatment of personalities are com- 
plex processes that cannot yet be done quickly, impressionistically, 
or mathematically. 

If a counselor has been using the questionnaires or inventories 
and has had some misgivings about their use, he may want to try 
some other methods of getting meaningful information about his 
counselees. There are other ways of getting such information and 
most of them still seem more promising than the standardized 
instruments in this area.** Among them are systematic observation 
and reports by means of behavior descriptions obtained from those 
who have had sufficient opportunity to observe counselees in a 
variety of situations; cumulative reports of participation in organi- 
zations; reports of selection of activities, courses, units, or topics 
when choice is permitted; selection of jobs when several are availa- 
ble; and interviews specifically designed to discover interests that 
are meaningful rather than transient or whimsical. Finally, of 
course, if a counselor feels that he must have some sort of lists to 
which his counselees are required to respond, he can make up his 
Own in a form that permits local references and adaptations,*® that 
does not force choices where the counsejee has no real choice, per- 
mits expression of enthusiasms, and encourages him to state that he 
has insufficient information on which to respond. In any case he 
should be sure to provide the opportunity for his counselees to tell 
/« r/jeir own words and in the length and detail of their own choos- 
ing what their interests, feelings, and problems really are. 

SUMMARY 

In this chapter it has been suggested that there are no valid 

Education Reports No. 40. Washington, D.C: American Council on Education, 
April, 1950, pp. 63-73. , , , , , , 

John W. M. Rothney and Bert A. Roens. CoBaseliH^ the Individuei Student. 
New York: Dxyden Press, 1949, Chapter lU. 

A sample of a locally devised inventory is presented in Appendix I. 



316 Measurement far Guidonce 

short cuts to the appraisal of personality, attitudes, mtcrests, and 
behavior of counselees. Examination of the form and content or 
self-descriptive inventories, records, blanks, and projective devices 
has revealed so many shortcoming diat their use by counselors 
omnot be recommended. (No judgment has been made about their 
use by clinical psychologists or psychiatrists.) Qaims for the value 
of such instruments ate either unsupported or the evidence that is 
offered is inadetjuate. It has been suggested that the study of vari- 
ous aspects of the asmplex behavior of individuals must undergo a 
long period of descriptive study before valid measurement in this 
area can be established. Until this is done it has been suggested tha( 
the counselor should employ mote direct and petsorul methods of| 
studying the behavior of his counselees by use of obscrvation^bc-j 
havior description, interview, and analysis of performance tech-' 
niques. 

Exercises 

1. It U geoetally agteed that self-report inventories are easily faked. 
What implications does this have for the actions of the counselor 
who is seeking data in the area of interests or personality? 

2. The Science Research Associates have recently presented a new 
form of the Kudet Preference Record, Form D. Compare and 
contrast this new form with the previous forms on such factors as; 

a. Empirical evidence of validity. 

b. Reliability informatioa. 

c Norm groups presented for comparisons. 

3. Take any five commonly used interest inventories and in a five- 
column chart list their stated purposes. In a second column find 
and list the author's stated theory of interests. 

4. As a counselor you have been asked by your school board for data 

on the number of '’maladjusted" students in your school. What 
procedures would you employ in seeking the answer to this ques- 
tion? * 

5. At a conference of counselors you hear the statement, "... I 



Personalify puesKonnaires 317 

know [name of an interest inventory] has poor validity, but we 
use it in our school because we find that the students like to take 
it and we feel it motivates them to think about occupational 
choice. . . Discuss the implications of such a statement. 

6. Select five personality inventories and list the personality traits or 
characteristics presumably measured by each. How are these 
defined.? Are the definitions consistent from inventory to inven- 
tory? What factors might account for the inconsistencies? What 
inferences would you draw from this exercise regarding the 
measurement of personality characteristics? 

7. Comment aitically on the following statement. 

The psychologist stands aghast at the self-assurance with which 
professional school counselors in America diagnose the personality 
faults of little children and the boldness with which they under- 
take the delicate task of adjustment . . . The student of genius 
who is familiar with the motivating influences that have their 
origins in quirks of childhood personality shudders to think what 
the results would have been if school counselors had had a chance 
to “adjust” the personalities of the budding geniuses of history. 
One can imagtrie them, freed from all their peculiarities and com- 
plexes, adjusted to the world as It was and becoming indistin- 
guishable from the common herd. 

8. The following letter was actually written to the parents of a high 
school student by a counselor. Comment critically on it. 

I am enclosing a report on the Strong Vocational Interest 
Test for Women which was recently given to your daughter. 

As you can see, her highest scores are in the business area, 
where she received two “A” ratings. Also to be considered, 

I think, is a ”6 minus'* rating as buyer, which might fit in 
very well with the business interest. She is showing some 
teaching interest at the elementary level, but definitely not at 
the high school level. You can see that she also lias a good 
rating as a housewife! Even so, her interests so far as voca- 
tion is concerned are less feminine than the average girl. By 
feminine interests I mean interest in the esthetic, cultural, 
and in the more personalized things. For instance, she would 



Personality IJuestionnoires 31’ 

mcnf in L. L. Thutstone (Ed.). EducUhnal Applkatlom of 
Psychology: Essoys to Honor Wdtor V. Binghom. New York: 

Harper, 1952. . c c • e. 

Hamlin, Roy M. “The Qinician as Judge: Implications of ^ Ser es 
Studies." fournol of Comnlfmg Psychology, August, 1954. 

Kuder! oT^Ftedrick. "Expected Development in Interest and Per- 
sonality Inventories," Edncnlional nnd Psycholog, cal Measnrement, 

Li!^:Galdnt.«fcAppercepto 

sumplions and Related, Empirical .Evidence. Psycbolopcal Bnl 

1954, 33:199-204. Vocational Interest 

Stordahl, Kalmer E. December, 1934, 

Blank Scores.” Journal of AppUed Psychology, 

38:42J-427. . Tnicrest Scores Over 22 

strong, Edward K. Jr. "Peiinanence “f 35.89-91. 

YeL." fonrnal of 

14:617-633. 



CHAPTER IX 


The Future 


The treatment up to this point may be regarded indirectly as an 
ovetaU description of the Oitret\t state of testing for counseling in 
the general secondary school program. The questions asked, the 
doubts raised, the criticisms implied, and the limitations described 
should make it cleat that much remams to be done if the testing 
program is to be iroptoved. 

If one were to take 1916, the year in which the Stanfocd-DInet 
was published, as a base year, and appraise the accomplishments in 
testing for counseling since that time, the results would be very 
discouraging. The rneasurement in guidance movement that began 
with such high ptonuse has failed to meet the expectations that had 
been raised by its auspicious start. When counselors seek tests that 
meet the standards that measurement experts have themselves 
called essential, the quest is generally unrewarding. Currently they 
can get only appioximations of what they need. Although they may 
hope to get better instruments in the future, it does not appear that 
they will get them soon. 

One author who has had much to do with testing has described 
the current test situation very effectively in the following words: 

320 


The Future 

What about these tests? Arc they all worth using? insider this 
unorthodox system of classification: (1) There are obsolete once 
useful tests which would die gracefuUy but for the mertia of test-me 
who still want them. (2) There are some Ul-conceived tesrt whui 
should never have been born and some which have 
these remain alive because of the ° ' Tf 

test-users. In deference to the proud authors I shall name none 
Some excellent tests are useful for very 

and wUl never have widespread usage. (4) Certain ^ 

standardiaed tests have and should 'Vsome 

application. Revisions and improvements can be ei^eA 0) Som 
nXer tests are stiU frankly «P"'“ental and can med muhously 
by those who understand their background and present status. 

Even the items numbered (3) and the first 
overly optimistic, but the frankness of e w o ' ^ 

who L cLerned with the construction and distribution of tests is 

"ZTherttbor has been e<,ually pointed pt 

current situation in testmg. „ measurement 

sonality measurement, his commit pp 

in other areas and morrambitious to 

Z|;i7Lr:rom ‘•^P^earth^o^P— 

s r- “ —rr,;: 

we have not known enough about our ^ ^ And, 

validity of every technical ratmgo^sta^en unreasona- 

furthermore, we have not been disposed to 

“Emm the purely historical and developmental standpoint, it may 

iHnoia G. Sriihoir. 



322 • Measurement for Guldonce 

appe2r to some that the authofs of this volume have been disposed 
to criticize unreasonably. It may be said, for in«ance, that Mmpared 
with other sciences, centuries old, psychological testing is but an 
infant. There is strong temptation to "go easy” with infants but we 
cannot go easy on testers at the expense of the youth to be coun- 
seled. From the standpoint of time, it can be agreed that psy- 
chological testing is indeed an infant, but the data regarding 
numbers of tests currently used strongly suggest that the infant is 
extremely large for its age and growing rapidly. 

What is really of great concern at this point is that, like other 
infants, psychological testing wH grow up. The answers to the 
c^uestions "How will it grow up?" and "What will it become?” 
must wait the passage of time, but the discussion that follows sug- 
gests some of the possible developments. 

50M6 POSSIBLE DEVELOPMENTS 

No one can foretell accurately what the future of testing and 
test development is likely to be. The possibilities are varied; some 
are remote, others seem less so. No one of the possibilities dis- 
cussed below is likely to operate in isolation and whether anyone 
will ultimately dominate remains to be seen. 

Major developments in testing seem likely, at this time, to take 
place in areas other than in iheit application to counseling. This, to 
a considerable degree, has been the case in the past and it seems 
likely that it will continue to be so in the future. The motivation 
for better development wjU probably come from forces outside 
the school counselor s office — from the armed forces with their 
problems of selection and from the manpower shortage of industry. 
The results of development nf testing in these areas, as suggested 
in the opening chapter, have been felt in the schools and counseling 
offices of our nation and it seems likdy that they will continue to 
be felt. 

But will it just be more of the same? It 


seems likely that we can 



■ The Future ' 323 

look forward, at least in the immediate'^futiire, to what might be 
described as "more of the same” with unsatisfactory new tests ap- 
pearing periodically as they have for the past quarter of a century. 
As with many new tools, these products are likely to be in the same 
pattern with similar built-in weaknesses and limitations as those 
that have characterized their predecessors. This trend may be 
tempered to some extent by the serious efforts of some authors and 
some publishers to meet the more rigid standards that appear to be 
emerging. Some factors that may influence trends are discussed 
below. 

MORATORIUM AND LEGISLATION 

One possible way to improve tests would be to declare a mora- 
torium on the production of new testing instiuiDents hr a period 
of years. During the period obsolete and poorly conceived tests 
Would be killed off, gains would be consolidated, and standards of 
test design and marketing might be strengthened. Unfortunately, 
such a plan can hardly be taken seriously, for the attractiveness of 
profits from test production and marketing is too great for many 
persons to resist. It does seem strange, however, in a country in 
which butchers' and grocers’ scales are regularly checked and 
policed, and clothiers' tags of ”100 percent wool" must be vali- 
dated if the sellers of such products are to avoid imprisonment, 
that a test distributor may sell his products without any supervision 
Of regulation. After reading many test manuals one is often left 
with the feeling that “there ought to be a law.” And it may come 
to that. It seems to have been assumed in the past that educational 
Of psychological tests could be produced and distributed without 
any kind of regulation. It has been found necessary to enforce com- 
pliance with Pure Food and Drug Acts to protect even professional 
persons, who, presumably, should not need protection. Perhaps 
educators, psychologists, and counselors need similar legislation 



324 Measurement for Guldonee 

for protection from those who ha.e uhen advantage of fieedom 
from control. 

THE INFLUENCE OF RAPID CALCULATORS 

Current trends toward autonutioQ, development of rapid com- 
puters, and the applications of punched card procedures seem 
likely to influence the directiotr in which test construction and 
utiliiation may proceed. White conditions may change rapidly, the 
current international situation is such that mobilization of both 
military and civilian skills is a matter of great concern and methods 
of effective utilization of manpower are constantly being studied. 
Wesman, discussing the topic of what is new in guidance testing, 
reported an investigation that could not have been conceived as 
possible without elaborate oamputation procedures in the follow- 
ing words: ‘'Two government activities perhaps deserve flrst men- 
tion because of their very scope. The first is a project being 
directed by R. L. Thorndike for the Air Force, the ultimate purpose 
of which is to conduct an aptitude census of the American people. 
Eventually, this program would presumably result in cataloguing 
the population with respect to its abilities, just as the well-known 
censuses have catalogued h with respect to age, income, material 
possessions, education level, etc." * 

The manpower shortage in many areas and in many skills has 
been viewed with alarm in many quarters and the efforts of many 
are being expended to seek out talent and to inventory potential. 
Educators who have attempted to define their role in this manpower 
shortage frequently advocate the use of large-scale testing pro- 
grams in the schools. In a recent publication of the Educational 
ToYides Coimsiss'K^, one o/ the rmpftcafions for education of the 
manpower situation is directed to guidance in the following recom- 
mendation: "Improved guidance and counseling. Guidance serv- 
"Gaidance Tefia^" OccupaltMt. October. 


1951. 



The Future 325 

ices, uniquely characteristic of American Education, should be 
further improved, and so increased in scope as to involve all who 
teach and to reach all who learn. Guidance programs should be 
soundly rooted in understanding of the manpower situation." * 

The rising tide of school enrollments, unprecedented in our his- 
tory, may well be counted among the forces which, in a time of 
increasing automation, can precipitate a mass testing movement 
that could make our present efforts appear meager. 

Each of these forces has a "mass” dimension about it and solu- 
tions are likely to be sought through techniques in which great 
masses of data are gathered. The possibility of losing individuals 
in a mass of punched cards does not seem remote. This thought 
was envisioned by Hull, at least in part, as long ago as 1923. 
Because of its current appropriateness, his description of "a ma- 
chine which makes aptitude forecasts automatically” is quoted in 
full. 

Another system of making aptitude predictions from forecasting 
formulae has been devised by the writer in the form of an automatic 
nuchine. This machine is an integral part of a comprehensive pro- 
gram of vocational guidance first sketched in 1923. The program 
calls for the construction of a single universal battery of tests which 
shall sample, so far as possible, all of the important aptitude deter- 
miners. The battery will contain perhaps thirty or forty different test 
units and require a day or more to administer. Upon the basis of this 
one battery there will be constructed separate forecasting formulae for 
each of the more important type occupatirms — possibly to the number 
of forty or fifty. Thus there would be forty or fifty different equa- 
tions, each equation weighting the tests of the one battery in a dif- 
ferent way so as to make the best possible forecast of a particular 
aptitude. These equations would, of course, all be much longer than 
the one given above for freehand drawing, each probably involving 
every one of the thirty or more tests of the battery. To make all the 
forty Of fifty forecasts by such a system in the ordinary way would 

* Educational Policies Commission. Manpower and Education. Washington, D.C.: 
National Education Association, 1936. p. 



326 Measurement for Guidance 

involve something like 1,500 multipliatioas, ill of which w^d 
need to be summated in a mote oi less complicated mannct. This 
would represent a huge amount of labor, to say nothing of the human 
errors certain to aeep into such a large amount of hand work. The 
forecasting machine mentioned above has been designed to perform 
this work automatically. Three of these machines have been con- 
structed. 

In its final form, this machine will have the different forecasting 
formulae placed in it permanently as a four-indi perforated band of 
thin, metal, somewhat resembling a music roll in appearance. The 
test scores will he given the machine in the form of a similar per- 
forated band of paper upon which have been recorded the test scores 
of a given subject. The test scores are recorded on this paper band by 
means of a special perforating device which is operated something 
like a typewriter. A series of foi^ test scorn may be thus recorded 
in about a minute. Once the test scores have been recorded on the 
paper band, it will be placed in the forecasting machine and the 
starter pressed. The machine will then proceed automatically, and 
without any attention whatever from the attendant, to make one 
aptitude forecast after another until the entire forty or fifty have been 
calculated. 

At the time of inserting the band of test scores there will also be 
placed in another part of the machine a card bearing the names of the 
subject and a blank form giving in a column the names of all the 
aptitudes and occupations for which forecasting formulae are availa- 
ble. As the rtuchine nukes its forecasts, it will stamp them down on 
this card automatically, opposite the names of the appropriate 
aptitudes. When the forecasts have all been made the m ac h ine will 
stop automatically, at the same time ringing a bell to call die attend- 
ant. The card of forecasts, when removed from the machine, will 
then present in orderly array and in units of single uniform scale, 
permitting of instant comparisons, forecasts of the individual's 
probable success in all of the chief occupations of the world. The 
youth whose potential aptitudes are thus recorded may then examine 
the card to learn those vocations in which his chance of success is 
low. These may be avoided in his choice of Ufe work. He may then 



. I . • - The Future - 

the card to learn those vocations in which his 
success is greatest. The three or four most promismg 

emerging may be "u^tanre^! 'Ly finally 

of his interests, opportunities, and general 

be chosen a life work. program of vocational 

It scarcely needs to be pointed out > 1 “' P S ^ rhe 
guidance thus briefly sketched « a icto u ^ 
current development of aphto e m mg from conserva- 

will no doubt be considerable me a 

live quarters. To this t“rm^^^^ quite 

gram involves a vast amount o ^ ^ ,„«fWers But the logic of 

impossible of end. We may look forward 

guidance for the masses of the p~pl=. 

Much of what Hull envisioned m 1923 ha m 
the components for hU -comprehensi P S Dvorak’s de- 
guidance’’ is available in Aptitude Test 

setiption of the “ 

Battery pubhshed by the "The basic 

very closely the ? B,at a large variety of tests 

assumption underlying ^ variety of 

can be boiled down eroups according to similari- 

occupations can also be c as ^ ^ 

ties in the „pe sitting and to interpret his 

person s vocational abuii ” • 

scores in terms of a '"‘‘*“^®^‘’pl^gan is designed for much the 
A battery of [bed as follows in a publication 

same purpose. His '’“‘*'7 . «xhe Flaitagat, AplluJe Classi- 

of the Science Reseat s ,,^5^ pp. 

.a„l 1 . Hub. ^ W 

CuiJa,, Jotr,A, No.=ob«. 1S;<V 



328 Measurement for Guidance 

fcation Tests (FACT) comprise the core of the aptitude batteri« 
in the Job-Test chart. These tests have been designed to provide 
measures of aptitude for 14 critical job elements. Two qualities of 
the FACT series appear to commend its use in programs of person- 
nel selection; (1) The test items are highly similar to tasks per- 
formed by workers in business and industry. (2) Flanagans 
job-analysis studies have indicated recommended combmations of 
tests corresponding to the requirements of particular occupations. 
(Science Research Associates. S.R.A. ]ob-Test Chart, 1954.) 

The computations involved in such approaches are no longer a 
problem when digital computers are capable of handlmg almost 
astronomical figures. The IBM type 650 computer, known as a 
"magnetic data processing machine,*' could handle Hull's 1,500 
multiplications with consummate ease. It can make 5,000 multipli- 
cations ox Sj'JOO divisions in a minute, 78,000 additions or subtrac- 
tions of ten-digit numbers in the same length of time. 

It will take but a little ima^ation on the part of the reader to 
envision the establishment of computer centers to which counselors 
could forward data obtained by test batteries to be punched on 
cards and fed into a machine such as the one described above. A 
machine programmed to operate on the basis of critical scores 
could then feed results to appropriate forms that would be returned 
to the counselor. 

It takes but little mote imagination to visualize an extension of 
the above wheirin copies of all diese results could again be placed 
on punched cards and deposited in centers where they could be 
sorted mechanically for the purpose of assigning manpower in 
cases of emergency. The obvious limitation in all this mechanical 
processing is the fact that the madiines can utilize only the data 
that are fed into them— they cannot improve on them. And the 
data, at this time, must be scores from imperfect tests. 

The danger inherent in the prospect described above is, of 
.ourse, that counselors may find themselves increasingly victims of 
expediency— of large numbers and of urgency. Instead of working 



The Future 

more and more in the direction of individualization, coumelors 
may succumb to group concepts underlying such approaAes as 
described above and may find themselves utilumg ‘“^ques 
designed for purposes other than work with mdiv.duals^ One o 
the Ljor tasks of the counselor as a consumer of tests, then, wdl 
be that of resisting the temptation to adopt for his own u 
methods and techniques that were developed for other purpos 

CONSUMER DEMAND AND TEST IMPROVEMENT 

Tests and use of tests may be improved ^ the rmult “““f « 
sophistication and consequent demands for higher ^ 

usL. The discouraging lack of progress to date may have 

because test users have not learned what to see .j j 

Ushers. Their failure to require high-quah^ tests “ 

huge production and sales of instmments ^ 

what is claimed for them. It does not seem 'f 
a significant improvement until the level of sop 

user! has been raised. AS Seashore has pointed ouh^ 

growth of good 

consumers and te t-pn^uc ■•Considerable progress along these 

c:ii:!irTh:;ofessionai — 

are beginning to write --^rVn^is mom LporU^^^ 
tion, standardiaation, and va rieine amone educa* 

the level of Increasing consumer 

!T;:te;t J !ver test authors and their pn^ 

Thfdlcussion above of 

future of developments m testmg is perhaps, as an 



330 Measurement for Guidance 

the developments may occur singly oi in subtle combinations. Ulti- 
mately it ^vould seem to the authocs that the best prospect for the 
future will lie in the increased sophistialion of test users. Such 
development seems likely to be slow. 

SOME BASIC PROBLEMS IN TEST DEVELOPMENT 

Demand for higher standards by test users should, in the long 
tun, result in the production of better tests than are now available. 
Before they can become really useful, however, there are some 
basic problems that must be solved. The most didlcult one is that 
of developing satisfactory methods of quantifying human behavior. 

In building a test it is common practice to select enough items to 
fill up less than one hour of testing time. Each of these items is 
then, by some mysterious process, allotted an equal value on a 
scale. These scores on a scale arc often totaled by a "scorerce'' or 
"quidc'seoring” device and the total 1$ christened as ’’mental ma- 
turity," "mechanical reasoning," "mental ability,” or such words 
as currently suit the contemporary pedagogical jargon. As such it 
is proclaimed as a better measure of an individual than the ’’sub- 
jective" judgment of a teacher who has observed a pupil daily in 
his classes for a year or more. At times it may be, but the claim 
that it is usually true lacks convincing evidence. Before such evi- 
dence can be obtained it seems essential that the following prob- 
lems must be solved. 

NEED FOR EXAMINATION OP BASIC CONCEPTS 

It is conceivable that a test with high predictive validity might 
be constructed by selecting items subjectively, scoring them as 
though each item were of equal value, and christening the total 
score. Gjrrently, however, no one can reaUy claim that an instru- 
ment with high predictive validity has been produced by such 
methods. The best results obtained so far are indicated by such 



• The Future • ■ 331 

small coefficients of correlation between test scores and criteria that 
prediction of an individual’s later performance in the area chris- 
tened by the author is little better than chance. 

Perhaps the time has come for test builders to reexamine their 
basic premises and techniques. Continuation of the usual timeworn 
processes of test construction that have proved to be almost sterile 
Seems not to be justified. There appears to be little hope of signifi- 
cant contributions to counseling by workers in the testing move- 
ment until a testing Einstein arrives to shake up its very 
foundations. The current way to appear scientifically and statisti- 
cally respectable is to follow the beaten path and to grind out 
again, with perhaps minor refinements, what has been endlessly 
ground out before. Sorokin, in an article that should be read by 
anyone who considers the use of tests, states the case very effec- 
tively in these words: 

Obsessed by metromania, our testers indefatigably measure their test 
data and present them in an "exact” and "objective” form of numeri- 
cal scores, indexes, statistical tables, marvelously decorated with im- 
pressive looking mathematical formulae and other simulacra of a 
precise quantitative research. Manufacturing of these "quantitative 
movies” is done so artfully that many a logically and mathematically 
innocent onlooker seriously takes this sham-quantitative appearance 
for a genuine reality. A legion of psychosocial researchers sincerely 
believe that these impressive looking scores, indexes, rows of figures, 
coefficients of correlation, probable errors, standard deviations, coef- 
ficients of reliability, and so on, deliver; but the objectively studied 
and exactly measured "diamonds” are biU: arbitrary, subjective, often 
fantastic, assumptions of the testers dressed up in quantitative cos- 
tumes and mechanical make-ups. Our testing numerologists have as 
far relationship to real mathematics as had various numerologists and 
astrologers ("mathematici” as they were called) of ancient times 
and of the middle ages.* 

A. Sorokin, ''Testomania." Harvard SJucaiional Review, Fall, 1955, 

25:199-213. 



33Z 


Measurement for Guidance 


THE PROBLEM OF DEFINITION 

The pioblem of securing agreement among test producers and 
consumers about definition of terms has limited, and seems Ukclf 
to continue to limit, the contributions of testing to counseling. The 
definition of intelligence has always been subject to dispute and the 
differences in definition have not been resolved by substitution of 
the words "scholastic aptitude.” The word "aptitude’’ seems to 
have almost as many definitions as there arc authors of aptitude 
tests if one may judge by the kinds of items that appear in their 
tests. Hulon has pointed out some of the problems and some of the 
implications for counseling that result from the failure to clarify 
definitions used in testing in the following words: 

There appears to be relatively complete confusion as to whether 
ability and aptitude are things which are changing or things which 
do not change. Certain aptitudes are goals to be attained, while others 
ace determiners of goals. We try to make choices on the basis of some 
abilities, and we try to develop certain other abilities such as to get 
along with people and to use the scientific method. We even speak of 
developing the ability to read. We believe that if a person cannot be 
a good stenographer easily, we should encourage him to be something 
else. But we believe that if a person cannot be a good citizen easily, 
then we should accept the responsibility of developing him into a 
good citizen anyway. Qearly the concept of achievement in relation 
to ability needs clarification, and we diould decide more definitely 
what should be done about individual aptitudes. We need a ready 
answer to the inejuitec who may ask, "Why don’t you improve the 
aptitudes you find this individual defici«it in? In the case of legal 
aptitude why not develop it, just as we tty to develop social apti* 
tudes?” • 


Mon goes on to point oat that lack of clarity in de6nition of 



The Future 

toms makes a very great difference in actual counseling procedures. 
He v/rites; 

Here is a case stated in two ^d 

One says "We have many cases of discrepances betwe« ab.Uy and 

^ rtr annfhM- either mote ambition than ability or 

ambition m one way or another, e jc a bov 

more abUity than ambition." The other worker says, ^ 

who has failed all his academic work througho,d the g ^ 

and is now failing in all his acadenuc ->>1=^ - 
school, and yet he says his plans are to go mtn a bookrsh oceup 

after college training.” . educa- 

These two "<=*ers agree that *eyjeF«^ ambiguous 

tional problem in different wor . 

entity concept of ability j ability do not jibe.” 

^bt;: jrj: i:;:cr ^vr. hU about this, nor very 

'Xi:^sseetowhat..««-r^ 

fec^ra L'td^hryoXe said A 

consider these things in relation °*en J ^ 

something about them. You can change your 

You can change ^ „( ico,abIe. Your performance 

plans, or you can expect to 
is not consistent with your plans- 

If school personnel will bred l^^tf tesB in the 

union there ^e ^ c^^^L" of stndents hecanse 
counselmg process. Th ^ understandmg 

they are "not wotkrng up -Vorking beyond his 

of that curious phenomeno , ^ j t,els to students 

ability.” These will be '-/“'‘“^aneldy Ls^^ 
and assume that they have P themselves, provide 

be mote realization apm^tion of the fact that 

sufficient data for counseling and more appreoa 



3J4 Measurement *<«■ Guidance 

Ihcy do not automatically determine the action that a counselot or 
his client should take. 


SUBJECTIVITY AKD OBJECTIVITY 


Pcihaps little pmgtess in the testing movement can be made 
until test users stop leaning too heavily on the word "objective.” 
Rolimey and Roens have comrncntcd on this problem in the £ol- 
lo'wiivg'KOjds*. 

The previously noted tendency to make all educational procedures 
seem standardized, and statistical has obscured the fact that 

most objective technic^ues have required a good deal of subjectivity 
in the process of their constniction. Exanunation of test ma n uals 
reveals that all the authors have nude some subjective judgments. If 
the authors of the manuals took their Hems from textbooks, they had 
to chooie among niaay whose authors had, in turn, made subjective 
decisions about the materials they selected. The authors of tests must 
have made judgments concerning the kind and number of Items to be 
selected, and even if they used statistical criteria to guide them, they 
bad to decide which of many critecia they would use. They had to 
choose their scoring methods and attach values to the items. Again, 
If they were chosen by statisidcal procedure the authors had to make 
selecltons from many. They made chokes among criteria against 
which to validate their tests and Uicy were required to decide how far 
they would go before they were convinced that validation was ade- 
quate. The "objectivity” of a test is found in scoring procedures 
(again derived largely by subjective techniques or choice from so- 
called objective techniques) that make it possible for two scorers to 
get the same results. The student who is aware of all these problems 
will not be too easily influenced in the selection of techniques by the 
fact that one tedmique is said to be objective and the other subjective. 
Completely objeedve techniques (except in the very narrow concept 
of scoring items) for obtaining a minimum list of items to use in 
counseling are not now available." 


JohQ M Rolhner and Bert A. Boea*, Couasehng the Individual StudtM 
New TojX: Dryden P»«», 1949, pp. 8S-8d. 



The Future 

It seems unlikely that there wM be much progress in the testing 


movement until test users begin to recognize 


the oversimplification, 


confusion, and misrepresentation that occurs 
label suggests to them that tests have greater 
scientific rigor than they do. 


when an "objective 
exactness and 


THE EMPHASIS ON SPEED 

As one reads test manuals and literature ^ 

observe that speed seems to be of the essence. The CaWomu T=s 
Bureau in a cover design that came -from the ofBces of ^ymond 
Loewp Associates, internationally famous designers 'uch Pro 
ucHs the Studebaker car, Schick 
erator," announces in screaming that its scor^ee 
is EA:sIER of administration, BETTER for “ 

FASTER in scoring. Otis’ tests seem 

they are one 

' ilrorl as rapidly as possible h difhmlt » 

performance of pupils is hurried, if the J ' j jp 

Lted so diat they can be done m one 

ing is to be done so quickly “t wonder how 

tion of pupils’ answers is not possible, one must 

dependable the results can be^ j 

When test users mvaZty, reliability, or adequacy 

administration or scoring sec u mp a more useful proce- 

of norms, testing for counseimg may ^c - a m.e - ^P^ 

dure. The provision of o^ei^phash on speedy 

Ueved much dmdgcry, from examination 

"objective" testing. An ® j American education was to 

of our tests that the major objective ot 
develop true-false and multiplecho.ce mmds and 



336 


Meosurement for Guidance 


THE STATISTICAL SITUATION 

It is aifficult to deteimine whether discussion of this topic should 
be placed in the section on limiting or promising factors in the 
improvement of the testing situation. It seems that at some time in 
the neat future educational statisticians must discover that they arc 
working with human beings rather than plants in fcitiliicd plots, 
oc balls drawn out of jais. Until they begin to realize that they 
have the special problem of dealing with the most complex thing 
in the world— a human being growing in very complex circum- 
stances — there seems to be little hope of practical contribution of 
statisticians to the counseling process. 

There can be no doubt that statisticians have elaborated the 
obvious fact, probably known by instructors since teaching began, 
that there are individual differences among groups of students. 
Whether the esotetic language used in the elaborate quantification 
of the facts of individual differences has contributed significantly 
to improvement of counseling of students is a question on which 
there could be much difference of opinion. When the counselor 
attempts to use the products of the statistician he finds that, by the 
time he has allowed for the assumptions involved, made adjust- 
ments for differences in circumstances of bis own counselees, dis- 
counted the difficulties in conversion of raw scores to theoretical 
scales, and tried to translate the resulting product into language 
that a counselce or his parents can understand, he has little that 
has been worth the effort. 

And statisticians, by and large, seem to be mote concerned with 
fleeting moments In a child's life than with his development over a 
period of time. Of the thousands of research studies only a very 
few have used longitudinal data. The usual practice is to take a 
quick sample of a child's b^viot with a test that takes less than 
an hour of his time, and then to hurry the scores to the ajmputing 
machines for correlation or for one of the cuirendy popular modes 



The FuJure 

of analysis— factor, duster, discriminant, or dispersion. In the 
process the data are normalued or distorted in some way so that 
they can be manipulated more easily. The findings usually mdrmte 
the general characteristics of a group of subjects and it rs oftm 
implied that they ate rather permanent. The results ate *en pu 
lished and the researcher turns away to another probto with 
another group of subjects whose test scores are ” 

new way that is currently popular. Statisticians rarely under ate a 
thorough study of individuals over a long enough ^ 

to determine whether or not their 
those who work with growing human 

studies encompass many data about persons over P““^ “ _ 

their growth does it seem likely that they will contribute sign 

cantly to the processing of data for counselmgpurps«_ 

Changes in practices in a profession muy ^ 

growth. Sometimes it means only a floundermg 
I retrogression. At still other times it ““P ® 

that disforts and destroys. The historian who 
measurement scene must have noticed the ^ 

laticn coefficient, chi-square. analysis ei 

factor analysis, and discrimmant ai« lysn in ^s 

popular it is applied to every I»ssi derived, 

their origin and the purposes or w lather simple 

Borrowing always from “donal statisticians apply 

sciences to get under their hal » bring 

their methods to the comp ex 5 translating ex- 

little of value to the counsel . , weight of pigs, or 

perimental designs dealing wi ^dance framework 

effectiveness of manurial treatment mto a guidance 

has not yet been resolved. 


TESTING AND FREE SCHOOLS 

The literature of testing suggests 


that testers would find their 



33g Measuremeaf for Guidance 

best working conditions in a regimented rather tlun in a democratic 
State. In a country in which a state or federal administration icgu* 
lated the curriculum, the hours to be devoted to it, ^e policies for 
pupil placement, promotion, or retardation, and in a nation in 
wluch workers could be allocated to jobs without respect to their 
personal wishes, the testing movemerrt might be highly effective. 
As "national” norms are now presented it is assumed that every 
subject’s scores can be compared in a reasonable manner to those of 
a supposedly representative group despite the exceptional assets 
or liablities in Ae situation in which he finds himself. Some forty 
years after the testing movement began one organization is firully 
plaiuung to study the influence of local curricula, school offeririg, 
and (juality of teaching on test scores.*^ It should be noted that the 
Educational Record Bureau has used separate scores for private 
schools for many years. 

In the United States local schools are given much freedom in 
choosing the educational opportunities they will provide, and it is 
known, that gteat diffecences in offerings result. The process of 
comparing a youth’s scores with a set of national norms, and 
making deductions about him on the basis of the comparison, 
implies that he has had comparable learning opportunities. To 
make such comparisons when it is obvious that he has not had such 
opportunities seems hardly to be a sound procedure. In a regi- 
mented state where markedly similar opportunities ojuld be 
ordered, and where individuality was not important, the testing 
movement would seem to serve well. In view of the great differ- 
ences in practices in American education, however, the professed 
claims of test authors that testing aids in better placement and 
guidjnes ot students seem to lack merit. Until the test authors 
provide test users with data that will enable them to give proper 
weight to such factors as the quality of instruaion and the kind of 
home, school, and community experiences to which the testec has 

rSLtS'sL';:®?"''’'"”''- ^ “«■ 



The Future 

been exposed, there seems little hope that tests can really be used 

^probable that, when the testing movement comes of age and 
testers recognize that differences in educational opportunities wiU 
always appear in a free society, test manuaU will contam a sta 
ment such as this: ■'The following factors contribute to 
ment of certain scores on this test. Hiey contnbute to 
in these proportions.” This statement will be followed w * a 
weighting of such variables as quality of 
tional materials, amount of time spent on e su J ’ 
motivation, and the physical condition of the stud^t. Un^ su* 
data are available, hLever, the counselor 

ment about the significance of a test score. Escape from this 
d jlf m rp^ does DOt Seem 

CURRENT SIGNS OF PROGRESS 

It has been indicated several times f 

prediction of behavior of humaris is a “y gj jj, 

Ldingly no attempts will bemad^topm*ct^^^^^^ 

One may, however, note some of th Ikted and dis- 

indicate promise of improvement Four of them are listed and dis 
T“;ion of new tests that indicate more careful attempts 

at standardization ^d va'idyuu- distribution of tests. 

’ 2. Publication of standards ^ reviews of tests may 

3. Provision of methods by which critical reviews 

reach the potential consumer. ^nlnvment of ^id* 

4. Raising of standards for trainmg and employment gu 

ance workers. 


PUBUCATION 


OF BETTER TESTS 


At times, despite 


all the difficulties meni 


tinned above, some rays 



340 Meaiurcmcnt for Guidance 

of hope appear. One of them is the UcnJ away from omnibus tests 
of intelligence, mental maturity, or mental ability anil towai 
assessment of several kinds of performances by means of a test 
battery slaniiardized on the same population." It appears that the 
era of the "Intelligence Quotient*’ is now rapidly passing. 

The lumping of performances into one whole score called an 
"IQ" has served the very useful purpose of making educators 
aware of individual differences in aiildren for so long a period that 
it is now ready to be pensioned off. Scores on ses’cral separate per- 
formances ate likely to take the place of the IQ on cumulative 
records. This is not a new development since several writers have 
for a long time advocated the breaking down of omnibus scores, in 
the haste in which testers always seem to be, however, they have 
takerr such small samples of student performance that the scores 
are not likely to be dependable., In the future it seems that there 
will be mote use of batteries of tests with titles that clearly indicate 
their content, and will sample enough of a student's various per- 
formances sufficiently to permit comparisons. 

Comparison of the manual of the Differential Aptitude Tests ** 
published in 1947 and one of the older single score tests, such as 
the Henm on-Nelson Tests of Mwlal Ability,** published in 1932, 
reveals some startling contrasts. The data on the older test were 
presented in two pages but 77 pages are rc^^uited to present several 
hundred validity coefficients, standard error and reliability data, 
and fairly adequate description of the norms and standardization 
populations of the newer test. The authors of the Differential 
Aptitude Tests have also published a follow-up over a seven-year 


‘ lottlliaciKe Quotient" without .the word "score" »tt2ched 
to rt wort i^ortunitr. TV addidoa the word miaht have reminded those 

^ “T ^ obtained from a test ww a faJhble score, not an 

lafillible index of true mental o^aolzatioo. 

( Bennett. HmoWG. Schore. wd Alexander G. Wesnun. A AGn.s/ 

OT2 r„„, N™ Y«,kt Thn PtkchnloglT&tpor.tion. 

re'lXv AlewSS.*?* '’t fne 'B' H-nmm NAm 

I esis of Mental Ability, Vankm, N.Y.: World Book Co , 19 J2. 



341 


The Future 

period that purports to establish the long-term predictive efficiency 
of its subtests.^* 

The pattern of complete reporting of the D.A.T. has been fol- 
lowed with a greater or lesser success by several succeeding multi- 
ple aptitude tests," some of which, like the ^ple^\ptitu^sjs 
of S egal and Raskin,* * attempt to provide "differential intelligence 
score" norms in addition to the usual grade and sex norms. Both 
of these tests have encouraged the user to employ the concept o 
expectancy tables as a way of interpreting the predictive validity of 


their scores. . , i i. 

Other innovations among recently published tests mclude 
items as attempts to establish occupational ability patterns m The 
General Aptitude Test Battery of the United States Employmrat 
Services;" the use of a standard score reporting system m the 
Multiple Aptitude Tests; the plan to continue the revision of norms 
and continue validation studies with the same group o ys an 
girls used in the preliminary validation smdies presented in the 
Flanagan Aptitude Classification Tests; " the purported compara- 
ble growth scores in the School and College Ability Tests; the 
profile reporting scheme of the School and CoUege Ability Tests 
that allows for graphic presentation of standard error concepts, 
the thorough norm data of fhe revised StanfordjWu^nt 
Tests: ” the effort to measure essay wnling and hslening 
hension in the new Sequential Tests of Educational Progress; and 

_ ^ Prsllnsar.ljD.” Tw/ ScrVICC 


..crv r*AT_A Seven- Ye»r Follow-Up.” Test Service 
George K. Bennett, The D^. • oyrDorstion, November, 1955- 

Bul!e„„, No. 49. New York: The Aptitude Tests. Los 

David Segal and Evelyn Raskin. f 

Angeles: California Test ® a. Xest Battery. Washington, DC: De- 
” Beatrice J. Dvorak. General 

partment of Labor. United States i/ • Supplement. Chicago: 

» John C Flanagan. Aputude Class, feat, on Tetts. Techmeat iupp 
Science Research Associates, 1953. , . r Cooperative School and 

Cooperative Test Division. Examtne/s Manual for toe 

World Book Co., 1953. - ^ Cooperative Sequential Tettt 



34i Measurement for Guidance 

reports to their users that go beyond a sales promotion approach 
and furnish information concerning general measurement concepts 
as well as empirical studies dealing witii particular test scores and 
their uses. Typical documents available upon icriucst to tire pub- 
lisher include the Test Service Bulletins and the Test Service Note- 
book series of the World Book Company, and the Test Service 
Bulletin series of the Psychological Oarporation.** 

Local school systems arc attempting to improve the use of test- 
ing materials by trying out tests in their own schools and by provid- 
ing in-service training programs in measurements and guidance. 
Bnglehart has quoted a letter that sliows how this may be done. It 
is reproduced below. 


In each elementary school 1 [_Miss Koisc Cason, Quid Guidance 
Director, Bloomfield, N.J., Public Schools]! have a meeting with die 
staff and/or new teachers on how to "icad ” a petnunent record card. 
Uecoeds of the children tn the sdiool ate duplicated and used as a 
basis for discussion. It is my impression that the average teacher 
learns most effectively about the use of tests by focusing on their 
meaning for understanding a particular diild. Test data are related to 
other information. It is also possible at these meetings to do a bit of 
educating on validity, reliability, etc Similar meetings are held at 
other school levels. 

In the junior high school the guidance staff prepares summaries of 
test data to help the teacher plan for the group with which she is 
working. 

When new tests are introduced in skills or the content fields, the 
teichers conccmca jtc asked to evaluate the suitability of the material 
in the tots in lelation to their program. 1 believe this procedure is 
greaUy appreciated, and temuves some ol the teat of tests." 


r.U S«.». Nu. tJ.Yo^N.Y., Wood Book Co. MarioB. DoluuW- 

Piodjctios Son... », Collju" r„. 5„i„ b,U,u,, No. SO. Yoolm NY: 
Wood Boot Coe Jra™ S- "Ho» Acev,,,. I. , Test T„/ 

Strm, No JO. New Yo*; Tb. Psyd.olostj.i Oupoi.Uoo. t9J6. 

" D Enelthut ‘I 

Rtstarcb, FebrMary, 1956, 26;>'l5. 



The Future 


347 


raising standards of guidance workers 

The coming of age of a professional group is usually indicated 
by the introduction of processes of accreditation or certi6cation, 
by the development of professional literature and research at an 
increasingly high level of competence, by recognition of established 
related organizations, and by increased acceptance on the part of 
the public. All these processes seem to be in operation in the coun- 
seling movement. As they develop it seems likely that the workers 
in it will become intelligently self-critical about their activities. In 
the process it seems likely that the selection and use of tests will 
receive much consideration. 

The trend toward inacascd professionalization and more ade- 
quate preparation of school counselors seems clear when it is noted 
that 34 states now offer counselor certificates and at least seven 
other states are in the process of drawing up such regulations.** 
Analysis of the specific requirements for these various state coun- 
seling certificates points up the fact that over 90 percent of them 
include an area of "Analysis of the Indvidual" among their re- 
quirements. They usually require at least one course dealing with 
the use of tests and measurements in counseling the individual. As 
more and more counscIors-to-bc have these preprofcssional train- 
ing experiences it is hoped that their sophistication in the use of 
test results will rise. 

Other signs of increasing professional awareness among guid- 
ance workers include the publication in 1953 of the report entitled 
Ejhlcal Standards of Psychologists by the American Psychological 
Association and the first draft of a parallel code of ethics by the 
American Personnel and Guidance Association in 1957. These 
guides may help the counselor to raise his professional status and 

«* S«: Royce E. Drewstcf. “GuidMce W<Mk«»’ Certj£«itioa Rttjmfcmenti." CuU* 
Liutt, W*ihin;;toa, D.C: U.S. 0£<e of EJocitjofl. 1956. -<5 rP- Aahu/ / Jonc* 

*n4 liocurJ M. Miller. '"Hie Niuoa*! Picture of Pup*! Personnel Serrice* la 195)." 
Tht Bulltnu of lie Sjiicnil Ajioeutiom of SeeoMJerj^ebool PnteipJi. FebruirT. 
19>< 3S;10>-159. 



Measuremeaf for Guidance 

to remind him of the limitations of his technical tools in working 
With counselecs." 

Boih the American Personnel anri Guidance Association and the 
Amerrcan Psychological Association have influenced training re- 
To^d Th' association has 

sS ft •■^trans" at the subdoctoral level, and has 

wrotriai? , 7^“”* •he various 

“Sir ' 7 ■"* ?«•«" ™Sht well in- 

tio:“rrir,““^ 

worbhopheld in r Pt'^Paralron in this field.- At a 

Amerll Pell 

counselor-traCrrlltltl' 

the training of counselrtr. ^ several areas in 

more actual errpertaental d!'u 
program of training could be outlined 7°'' “ 
reached by Stoughton In writ' ’ ^ smrrlar conclusion was 
he said; "The scarcity of lacZh ™ P'fPa'ation of counselors 
should be viewed as a harsh chan" ••’““« and growth 

•mirwledge has increased so rapidivn “ wWth 

to be many workers whose ptofessio T “7'^h •*•“' are known 
limited." " P onal training is. to say the least, 

It appears that in the next few 

mentation in the areas of detemrll^* “P”'" 

“ Pnd„.=xi,n ® ““"ador competence and 

"Afflericm PsychoIo^cAl A - ‘ Amencaa Personnel 

“N,Uo„| A,»d.Uo„ c,;. Seprewber. 1955, 

tlX"' 

“I<.hcnW.S»»slu,^.Ti,p,^ . AnochUo.^ 



The Future 349 

the kinds of preprofessional training experiences that will help 
him arrive at that stage of preparation will be done. 

THE SKEPTICAL COUNSELOR 

Having looked to the future and aitically evaluated the present 
in regard to the use of tests in couaseimg, just where do counselors 
stand? Do they "throw the baby out with the bath water” and dis- 
regard all test data until that day somewhere in the haxy future 
when all reliability coefficients will be 1.00 and validity evidence 
will be reported in coeffidents of at least .97? Or do they admit 
that these tools are not perfect and plan to continue to use them 
because they are the best available at the present time? Perhaps 
there 1$ another way to avoid these extremes of complete rejection 
or unsophisticated acceptance. 

The way may be found in the development of a healthy skepti- 
cism, a critical and challenging search for data that olTer evidence 
of increasing efficiency of the tools with which they work. It is a 
skepticism that would look aitically at nontest as well as test ap- 
proaches to the understanding of the individual counselee. It would 
not reject a wise and tempered '‘subjectivism” in favor of blind and 
empirical “objectivity" simply because the latter uses the language 
of mathematics and the former the seemingly less precise language 
of description. 

It may be the skeptidsm of a counselor who has Icnrncti to ques- 
tion the blatant, merchandising claims of some test publishers and 
who insists that the authors of tests furnish proof that they do at 
least approximately what they claim to do. It is a skepticism that 
encourages the counselor to make his own studies, to work out his 
own validity, reliability, and nonnativc data based upon test per- 
formances of students in local school systems and to follow his 
counselees throughout their school careers and on into post high 
school training and work. It is a skepticism that takes nothing for 



350 Measurement for Guidance 

granted, and keeps him seeking dependable devices to help his 
counselees to help themselves to make sound decisions about their 
plans and actions. 

SUMMARY 

In this chapter an examination of the cutccnt status of the test- 
ing movement and some projection into the future have been 
attempted. The examination has shown that test authors ate much 
concerned with achieving statistical sophistication, but seem less 
concerned with basic problems of rationale and validation. It has 
been suggested that the pace of production of new tests based upon 
timewota procedures should cease until a careful examination 
of the very basis of present-day psychometrics has been made. 
Future developments b test conslwction and design have been 
considered and some promismg current "new” procedures have 
been discussed. The greatest hope for the immediate future was 
seen m mcr eased consumer sophistication as antidotes to high pres- 
sure sales promotion campaigns of test publishers. This sophistica- 
tion and beteased professionalization among school guidance 
workers was seen as necessary if counselors are to find tests useful 
b the important tasks that they undert^e. 

Exeroses 

1. Examine the catalogues of five major test publishers. Compare 
their stated policies on restriction of sales of tests to potential 
buyers. 

2. Compare the test manual published In the 1920’s with one pub- 
lished m the I950’s. What are the major differences? 

3. In a chart compare and contrast the strong and weak pomts of 
three modern 'multiple aptitude tests.” Use the criteria for test 
evaluation developed m Oiapter V for this exercise. 

4. Contrast the basic philosophical differences m point of view of 



The Future 351 

guidance programs in England and France with typical guidance 
programs in the United States. 

References 

American Educational Research Assodation. "Educational and Psy- 
chological Testing," Review of Educational Research, February, 
1956, 26:1-110. 

American Educational Research Assodation and National Coundl on 
Measurements Used in Education. Technical Recommendations for 
Achievement Tests. Washington, D.C.: National Education Asso- 
ciation, 1955. 36 pp. 

American Psychological Association, Amerian Educational Research 
Association, and National Coundl on Measurements Used in Edu- 
cation, Joint Committee on Test Standards, "Technical Recom- 
mendations for Psychological Tests and Diagnostic Tcdiniqucs," 
Psychological Bulletin (Supplement), March, 1954, 5I.T-38. 
Anastasi, Anne. Psychological Testing. New York: Macmillan, 

1954, 682 pp, 

Brewster, Roycc E. "Guidance Workers' Certification Requirements,” 
Guide Lines. Washington, D.C: Office of Education, 1956, 44 pp. 
Buros, O. K. (Ed.). Fourth Aiental Aleasurements Yearbook. High- 
land Park, N.J.: Gryphon, 1953, 1163 pp. 

Educational Policies Commission. Manpower and Education. Wash- 
ington, D,C.: National Education Association, 1956, 126 pp. 

Ellis, Albert. "Recent Research with Personality Inventories," Journal 
of Consulting Psychology, February, 1953, 17:45-49. 

Kuder, G. Frederic. "Expected Developments in Interest and Per- 
sonality Inventories,” lUucasional and Psychological AieasuremenS, 
Summer, 1954, 14:265-271. 

Ross, C C, and Stanley, Julian C Measurement in ToJa/s Schools. 

Revised. New York: Prcnlicc-HaJl, 1954, 485 pp. 

Rulon, Phillip J. "On Concept of GrowiJi and Abilit)*," Harvard 
Educational Retieu’, Winter, 19.17, 17;!— 9. 

Sorokin Pittrim A. "Tcstominia,” HartarJ Educational Ret lew. Fall, 

1955, 25:199-213. 



352 Measurement for Guidance 

Thorndike, Robert, and Hagen, Hizabeth. Measuremenl and Evalua- 
tion in Psychology and Education. New York: Witey, 1955, 
575 pp. 

Travers, Robert M. W. Educational Measurement. New York: Mac- 
millan, 1955, 420 pp. 



APPENDIX 


Acfivifies Reports 


It was suggested in Chapter VIJI that requiring counsdees to 
respond to Jong lists of interests, activities, or questions designed to 
measure personaJity characteristics is not JikcJy to provide valuable 
data for counseling. The practice is, however, so common that it is not 
Jikely to be given up. In view of this situation it is suggested that 
counselors who feel that they must have such lists should make up 
their own in a form that permits of local reference, that docs not 
force choices where the counselcc has no teal choice or information, 
permits him to state that he has had insulHdcnt experience with an 
activity, discourages attempts to fake responses (since the items he 
nurks arc to be discussed during interviews), and provides oppof' 
tunities to indicate activities that are very important to him. 

The following Activities Report is presented to show how tliis may 
be done. All the listed activities were obtained from interview notes 
entered on the cumulative records of subjects of the Wisconsin 
Counseling Study.* The items were put into series so tliat tljcy cover 
consecutively the areas of reading, sports, music, scIjooJ clubs, out- 
of'scjiool organizations, collecting, hollies, care of aninuU, domatic 
duties, art, and miscellaneous. A general uneJassi/ied section appars 

‘ John W, ht. Roihncy. Ckid^nc* Pr^ttirtt Rtmlti. Nnf Yoik, H*/yer * 
Bros., 1938. 


3S3 



354 Measuremeal for Guidance 

at the end of the lists, and a supplementary report on activities is 
required on the last page. 

The counselor may choose to summarize a counsclee's activities on 
the Analysis of Report on Activities in preparation for interviews. 

The Activities Report is not intended for use as it appears here. 
It is offered as a suggestion of a method that counselors might use 
in their work. They may, of course, want to change the time intervals 
used in this sample. The effort to construct an mstrurhent of this kind 
should reveal to the counselor some of the irudequacies of the expen- 
sive and unrealisdc devices commonly employed. He may find th.it 
the instrument he develops is more useful in counseling than those be 
has purchased in the past 


ACTIVITIES REPORT 



^ <0 03 

« ^ ^ tt 

poo 

c: .4^ 

a ¥&S 


^ o a; 

?g “2 

1^20 

5 3.«| 

ss o 5 
^ -H S ® 



as 



ACTIVITIES REPORT (Continued) 



^ ^ Vi OJ Vi { 

«J Mg 4 



3S( 


'NOTE: Where the activity is seasonal, such as ice 
hockey in the winter, answer in terms of 
almost every day , a few days a month , two 
or three times a year , or never during the 



ACTIVITIES REPORT (Continued) 

EXAIIPIES: Hanging around the local dairy 

bar (named) 

Collecting stamps 


sss s 

QQQ O 


o a 
0) 
C 9 •H 
•-> O 


u u 
o o 
a o 


C 4^ C4 

a ^ o 

O 


c « ^ 

•H H 

o o 0 
O S 


0 

cd n 
o c 

CQ 


- >> 

I >» G 
H flJ 

G O 
<D G • 
U-H ^ 

O r4 O 


G 

^ o 
C 

o 0 
B U 

•r4 a 


u u 
o o 


u 

% <0 O 

^ il'O G V* 


i -O G 
•H G « 

' I G 

I u) >: o, 

i5§i^ 

-H 

O O >>| 
ID 1^(4' 

43 ® 

G • 'O . 

® s 2 5? 

H 0 H 43 © 

p, .iJ ^ CT *0 

•<-« G rj ^ 
® 43 G 2 

> >» O ;H C 
13 © P. ® 

^ ^ G O 

Ih 4< >• 

G =2 o o ® 

O G 


G > ^ 


G o ^ 

a o 

O G O ^ 
to n C 
G -H 

2S s*- 

5 

T* O 
t. >. 5; c 

-^ih 

© S'o is 

G G o 4-> 
0._ -P < 
tn "O , 

>G * . 

iJ"* §■§ , 

* s M ' 
© n ^ u ' 
® o 9 

123 s s ■ 

> ^ ^ : 
a •r* G ; 
+> ♦» 4^ 4; 
n o 43 
-H G to G ' 
t! 2 
•-I a >» ' 

P G 

G c 3 n ; 

o o o ° ; 


O'© : 

■G ti "H 
fO G S 4^ • 
o O i* G 

o ^ G ® 
^ U o 43 ■ 

c* ^ >*♦* 


?H X X 

sss 

aoQ 


SS7 


IR. Reading mysteries 

2S. Playing football . 

3M. Playing a musical instrument 



aaaaaKKa asa asaaaaaa a 

JH >1 JH >4 tH JH >1 in >4>*>* >4 >4 >4 >4 >H >4 JH >4 >4 

ssssssss sss sssss:sss s 

nOPQOOPO OPP PAPPPPPP Q 


bl 

• ** n 
3 u 

• o © 

© > 

• wo 

o M 


I V3 


to o an 

♦i o d -M 


> C o © M a 
JOB o 
•H to-o fc. 
0 ^ to c o o 
: d c -H a V. 
4 u 1-t 4J 
3 ^ TJ O to to 
3 C H O C C 
9 © O H -H iH 
■» t04J iH ^ t, 
■* U ♦* O © « 

< o << u :s o 


tOG 

p 

to-rl to 
c-^ a 

•H c -H 
& -H *0 

o © 
Wp-iK 


- • • •© M) 

a:3§S 

O © ^ 

•H is O 

I ♦* © © © 

I O © R 

■ -H © C 
^ js -H to 

to to to^ 
c c c Ts 

•M "r* S 3 
•O >* >» © 

© © © 4^ 

© r-t H 4J 

(CPtU-a; 


» © » 

a xi a u> i 

• -*4 o o O « 5 
^^XiXi P I 

• © © O' 

o B H 45 ^ 4 

• 5 o 4-> © ( 

«n3*F4 © 1 

• to C o ^ 3 I 
S -H B o 

• -rt 4J tO^ 1 
T3 o Wj C , 

^ G © C -rt 10 ^ 

A © rH •H 44 © I 

3 *s H 44 t4 ^ : 

H 4^ o © O O I 

o<<os&nf 


►riU330<P^M DSt/JISirf hrfr4*T'Q*/-4_ii.T 

io«>t^®o2^ wwgs o^Sfe§2^ 

r-t i-t i-4i-<iHrH •HHrHHWOJC'J 


259 


23R. Reading sport stories 



AOIIVITIES REPORT (Continued) 

S4S. Playing baakotball 

25M. Playing in an orohontra 

28K. Attending meotinga of Y-oluba . , . 


Jx >« iH Sh JH >4 X 5h>hX Jh >4 Jh >4 Jh Jh X >4 

QOOQQnooQo ptaa oqqqqaqo 



<USA.OCX ccvssx 
>.COO>OiHNtO ^lOOC^ 
MOiOZtOtOtOtO torotort 


o s 

CO o o 
eO ^ 


3$t 


41P, Raioing rabblta . , 

42D. Knitting 

43A. Decorating own room 
44X. Writing poetry, , , 





q 

o 

o 

g 


to 


J> 

s 

o 


> . •“ * * 

• • 3 . . 

• . 

• • • • 

* • • w 

* » ® 
• ^ 



• C 3 

■i; 5 ^ . . . 

• 0 .s 
.1“? • • • 


' . 6 

• ej 

• » ^ 

* • tj 

“ a 5 m . . 

* * 

* • t* 

•'<^ 5 s . : 

* • ♦ » 

• * eiJS 

’°oo?2 • 
•t m n ^ ? • 

• ♦ 2 « 

, -5 -H 

, 0 

• 0. 

* ‘ 0 ® ' 

i2°-S > tag 
!SSSg. 5 § 

1 0 m 2 *? :? « 

.1^ « 
♦3 0 

. 0 

• 0 

n JB w • 

* a c? to 

• 0 •<■1 


cq 


^gsg|*^g| 

‘■•g 

to 

.S.Sgt>SS5 ta 

§«S5|;h3- 


tlOr^ 

c o 


5 S 




^ C4 © <-> 
I • tCfl D 

-'|’g> § -“g^ 
cg'g>«-S 

ssSSg 

KmSg- 

§525 ■ 

w ira a) 


? o 


oSS 

ls» 


•2.« 

^sg to^ 

3 |“ 

5 ^ <3 

2 O c: 


S «) 

w ^ 
Oo;§ 


3»0 



ACnviTIES REPORT (Continued) 

66X. Wording with a chemistry set 

67R, Reading about ways people make their 
living 


2? 

Jh >< >< tH 

Q Q Q Q 


n p Q octo Q 


><>«>«>«><>« >f N 
PQPPPPQP 


• o ♦J 

rj 

m 3 ffl 

5 .G iQ 

xi o 3 


5 d tS . a 


^ ^ 

Ch ® ® 

^ « Cl 

, S > « 

•jJ c G 
C -H ^ 

d 4^ 

a> o 

rH O O 
O Q 5 


01 Cl 

• to O 
a a 


<D Cl 


o o 

O >t 

• * to f* 

Cl d «-> 

• 3 G n 

to o 

• o c P. 

A d o 

» O rH ^ ' 


♦ 3 

o 

• ^ 


to 


G -O U 
G G 
>> O Q> 


V. tea * o 
I o to c *<-< • ** 

O G 'H ^ CO 

-H 4^ (4 to C to 

4j 'd o o c C 

^ c o > ^ 4 ^ 

f o rH Tj > C Q. 

V rH O d -H rH 
O O O O d O 
O < O ^ ^ P4 S 


C Cl n Cl 
G • to to 03 
3 to 6 C G 

O c -H rH ^ 
•r*> O 4H 4^ 4H 
0,000 
Q to O O O 

G c a a s 


Ch o. to to to 
^ c c c 

to tO-H -H -H 
G c '3 *0 *0 
•H rH C G G 
*3 >» o o o 
C3 e3 *■* *^ •*■* 

o rH 4H 4H 4^ 

« fl4 < < 


O rH Cl 

o o o 
G '3 G 
o *> 
tOB o 
f G »-• 

rH to O 


to 


O 'O c 

rH rH rH 
rH rH ^ 
O 3 d 
o pa :b 


s*i O X P -< K 
rH CM *0 «0 O 

t- pr t- tr Cr Cr 


psvisSi>it^os:a 

CrCOClOrHCMtO'J' 

trCr-trCOCOCOCOCO 


1*1 



jxjH NJmSh 

ss sss 
QQ poa 


aaaKBSS assK 

>i >4 >*>*><>*>* iHtMixx 

ssss 

pQGPPftO GGQO 


GQaa 


• +3 d • ^ 


o d d 
• to Uk 
£ fl Q 


■«■> eo u) 


S 5 


o O • to bO 
C C 

WJ 60 60 • -rt -H 
G C R , 'R -R 
-H -H G C 
^ M 'R R O O 
cd ti <d cd 

tl O 0) V> -P 

as Kt*<i5-a; 

<X KWSW 
in c0 t> CD o) o 

CD CO 03 CD CO 0) 


• V. * ♦ • • S 

0 W B 

01 w w A 

• 60*0 3 • * 

a ^ o 4^ 

. .5 o 43 • • d 

*» o 

« o ® tJ • to Cl 
O t4 ^ G G 60 

• R n-l 'p-i 'H C 

toja OJ r4 -M 

• bOG 43 © -G 

G 60 TJ 

• M *» C to o 

XT O -H C E 60 

G o *0 -H G 

A « I-I M ^ >.M 

d r-l -rl +j 03 

H 4^ O d ® iH d 

o << o m VI o S 


H • rt W A 

U * W to » 

o •da cv 

4J • fH -H • CO 

W • ♦> +» t. 

» © o A o 

© 60 © © d © 

> • c a e H 

O "-4 O lO 

r< bO 60 bO n 

bO B d C O -H 

to G -H ^ -H a 
G -H W >0 'R 03 C3 
•H s B G H O 
'R B O © © C1,H 
CO M rH 4^ +J d H 
© ^ O 4^ 4^ pH O 
pe; VI uj <; <; 03 o 


60 

a 

>.*H 

a > 

u -H 
® XJ 
tj 

d 44 
E-( CO 


XaXQ«4X «tOS!s!t3j 
»-ICM«>st«U}tD t->CQOvOr4 
cnooioioiO) vjoojoo 


O X w 
DJ to 
000 

(H t-4 


342 






sjas 

a 



><>«>' 

tH >< >4 

N 


ssss 

sss 

sss 

::s 

aoQQQQcao 

diaais 

aaa 

aaa 

a 


e> 

• s 07 * A 

a o o 

• a h • 4^ 

M 3 o 

d <3 *0 

3 o -n 

5 ^ -H • o 

n C o a. ^ 

3d • a 

D Cl 'd 

3 r-» M o • d 

a o a u> •H 

B O O • > 

M J3 v» H ^ O 

o o o £ S 

B n o C7 Cl 

H a d 

Id «<»-« 

3 O to c td 

3 C •H CO d 

td-r^ >* G -H 
d d *3 CJ -n ♦f 

3 -H d d d 

•t o M d 

4 (4 «-4 d d 

3 o 4^ o a Qi 

4 ^ < O s o 


3 TJ 
» H h 
Cf o 


« Cl 


5 


5 • * o,w § 

° ’s' o 

♦> • *0 ^ ^ d 

Cl *4 55 S,"*^ 

d o c ® d 

Cj • o Jj n . ^ 

d t* I> « 4^ 

rH • o ® <; n d 
a. o o 2 73 d 

•r4 d G 

d •tocdtddAo 
d d d a 

CO w*^ ^ o CO CO 

dcG'd'a>dG 

--»-^dCCO”H^ 
'O4^«^ooc<i'd3< 
ddciw+j d.Gti 
o3‘^4^4->ado 
p3X»^<<<i-4pq^ 


tt; OT s >: o K X ctfws^x 
iOCOC-COOJOiHOJ PB^jtlOOC^ 
OOOOOrtrSrM »-< W •H 1-4 r^ 


X X 
o O 
r 4 Ot 


121R. Reading nonfiotion 

122S. Fiahlng '.!’.! 

123R. Liatoning to radio onialo 

124K. Attending mootlngo of tho Junior Rod 


AoIIVITIES report (Continued) 
ai„g o,eotlngs of a HorsBn,ansWP 


bh a aaaaa a aa a aaa a 

s sssss ss s Hss s 

fta a oppftQ p OP o ooo o 


Q A iM 4^ O 
4 J S •-< n s 
4 J H O 'H p 

«s! o u a 


« ^ c c t4^ 

o C^-H 3 S ♦ 

W tJ o ® <u • 

2 -J* o O G o 

« a E • 

rA yj 

^ ^ «o ui c to n 

o G G -H c © 

1 tiO Cd -H -H +> -H -H 
C tJ TJ o C fc* 
•H © G G © © © 
>© eft © © T-f 4 ^ 
cd tt 4 -' ■•-> <H © (ft 
I O O 4 J 4 ^ O vH ^ 

I p; E << •< o p s 


© © • © 

•H . M bO 

bi 0 « 

O . -H .0 .rt 

W>*^ 3 4 J 
© C © r-4 O 

•H © © © 

^ 4 -> a a 


G G tJ ‘rl Td 

•H 0 G > G 

H © O 0 

« H V» H 4-> ; 

O O ■►» © V> r 

PS pS <5 +» <J I 


W ON pCWNfcrfOW KW« fcf* ON PS 

in (Dt^ C0(7>OfHC'J»0 ^mcD c«. C 00>0 

03 03 03 03 03 tOtOtOtO tOtOlO to tOtO^ 

rA »HH Hr^r-»«-»rHH rHHH rH HrHrH 


3 M 



ACIIVITIES REPORT (Continued) 


52; azizigs zzaassa as; aaa sss 

>f >1 >1 ^ rH N Jh >< ^ in (X 

s ssssss ss sss s:ss 

Q QQftO OQACr^ftQ no QQQ QQQ 



wfcd liii o « B:w«t4o>4 Kwbi o « 

iHW t£>^*COO>Or^ C\2tO^ U5cD t^C0O> 

rj«^ 101X310 tOtO lOtOlO 

r-lr-l r-^l-^r^ r-1 r-i tX tX fX r-4r-tp^ Mi— ( r-XMM 


3*S 


Attending meetings of a Hatur'e Club" 



AOTIVIIIES report (Continued) 

160C. Collecting pennants 

161X. Traveling 


as ssa ass ass ssssssssss 

JhJh XJhJh >*>^>* JH tH >+jH JH JH 

ss saa saa aaa sssasasssa 

PO PQQ OG« AQO PQOAAQQQaO 


s 

B • O 
o 

p . fl 
S o 

O • >0 

<0 c: 

d • o 

CO A 
ul C 

o >» 

O u ^ 

Xi ‘rl O 
H O 
CO 

a -p to 
•ri si a 

T3 tO-r1 


2 w 
o, ta c 

« c 


■sis 


.-< -o . . _ 

t04J CO4J a 

s o M C3 O C3 

•rt © Oj ® *0 

^ •H ^ 

cfl H o Qt f-4 a 

H O O •-* O a 

(I4 u cn (U O M 


H A 4 » O 
•O «H W * © 
© tt t* 

«H 4^ X *0 
3 © JO © 

4 ^ 3 a C B 
© A n 'H a 
> O TJ 

bO CO to C 
C B B © CO 
•H -H -rt tj q 

4^ 4^ iH 

U U U 4^ •© 
© o © O B 

rH iH 1-4 pH O 


o^tox yjuX vJox oouxx 

Nwin tDjNoo o>oP N CO T? S o 

totocD w«>a> tOft- 

M H H rHrHfH M M H rH rH rH rH pH 


3«i 


K 


I77X. Daydreaming . . • • 
178X. Watching television 
179X. Going to movies . . 
180X. Watching sports . . 
181X. Going to parties . 



ACTIVITIES REPORT (Continued) 


ssssssssssssssssss 

QQOOQQQOOAPOQOQ®®® 


u 

s ••• •^ ******* * 



• ,.00****5 oo 

. § S “ • M •S.S -I -5 • ra ?•§ 

mg^H C ov> g'.cj.u^o 

^ » « “Pi'S . SS • s o5 

;3flc^g • - •52«o^ 

p, loto ^2^5^ 


3t7 


200X. Clerking in a store ••••.•>«• DM 
Other kinds of books or magazines you have read 



AOIIVIXIES REPORT (Continued) 



34S 


Did you have a Job during tho past suaunor? 

Part-timo Full timo How long? (Wooks) 

Typo of job Doaoribo what you actually 

did on tho job 






activities report (Continued) 



If you don't have a Jot now, do you expect to get one? 



REPORT ON ACTIVITIES 



378 


Supplooentory Information and/or Counselor’s Coaaaonta: 





INDEX Of NAMES 


Abt, L. E., 318 
Adams, G., 62 
Allport, G. W., 227 

Anastasi, A., 47, 113, 126, 140. 149, 
343. 331 

Anderson, G. L., 318, 321 
Anderson, G. V., 23 
Anderson, H. H.. 318, 321 
Andrew, D. 62 


Bailhol, R. P., 318 
Baylef, N., 142 
Belio, 248 
Beliak, L. 318 

Beanell, G. It, 23. 86, 100, 340, 341 

Benton, A. 293 

Berdie, R. F., 246. 278. 283 

Berg, L A., 106 

Berkshire. J. R., 283 

Bittner, iL M., 113 

Bixler, R. H., 249, 269 

Butler, V. H., 249. 269 

Bond, G. L., 114 

Bordin, E. S, 268, 269, 293 

Brewster, R. E., 347, 351 

Biookover, W. B., 244 

Brown, CW^ 113 

Buros, O. K., 36, 47, l49. 282, 3l4, 
344, 331 

Caplow, T., 246, 260 
Carter, R. S.. 237. 280 
Cattell, Jaques, 119 
Cattell, J. M.. 1JS» 

Cattell, P, 126 

Cattell. R. D, 289. 303 

Christenson, T. E, 41 

Coletoan, J. C, I49 

Cook, W. W-, 224 

Cottle, W. C, 107, 113 

Courtis, S. A-, 224 

Cronba^Uj, 23. 47, 63. 113. 149 

Crowder, N. A-, 149 

Cfuae. W. W, 63 


Cureton, E. E., 113, 149. 343 

Danielson, P. 6 
Daricf, J. 25. 288, 290 
Davis, A,, 342 
Dewey, J., 15 
Dolansky, M. P-, 346 
Doppelt. J. E.. 25, 149. 346 
Douglas, H. R., 237 
Drake.;.. 113 
Dfcger, K M-, 149 
Dtessel. P. E., 269 
Duran, J. C, 90 
Durnall, E. J., Jr., 318 
Durrell, D. D, 133, 149 
Dvorak, B. J, 13, 327, 341 

Eells. K., 244. 280. 342 
Ellis, A, 318. 343. 351 
Englehart. M. D, 345. 346 
Eron, L D., 318 

Faries, M., 268 

Huagui. J. C, «. 115. 511 

Foib.1, r-, 101 
Fmt L. 1C, 2»8. 510 
Franw F. S, 47. HO. 545 
F>i<l=. B. G, “0 
Fuist, E. J, 289 

Gardner, E. P-$ 224 
Garrett. H. E, I4l. I49 
Gaylord, IL H, 113 
Ghuelli, E.E,lt5. 318 
Coodeooogh. F, 47 

Guilford. J. P, 128. 229. 232. 280. 318 
Gusud. J. W, 280 

Hagen. E.. 150. 343. 352 
Hahn. M. E, 286. 287 
Hamlin, R. M., 319 

Havigburst. R. L, 247 

Hetmann. R. A, 227, 245, 2S0. ^9. 


S71 


372 

Hennioa, V. A. C, 540 


Index of Homes 

NeUod, M. J., 340 


Herzbefg, F. 88 
HJilreth, G. H., 47 
Hilkert, R. N., 96 
HoHingshead, A, B., 246, 280 
Hollingshead, B. S., 246, 280 
Hotelling, H., 128 
Hull, C. L.. 232. 280, 325, 327 
HusnphJeys, J. A-j 62 
Hunt, H. C, 251 


Olsosi, N., 257 

Otis. A. S.. 141, 150, 228 

Patterscui, C 114 
Patterson, D. S., 293 
Pheannao, L. T., 246 
Pierce- Jones, J. A^ 107 
Polbdi; A. B.. 114 
Prator, R., 251 


Jacobs, R., 287, 309 
Jenldiu, J. G., Il4 
Jolmscin, G. H , 298 
Johnson, R. H , 114 
Jones, A. J, 26, 63. 114, 347 
JniiiaA, A. M , 62 

Kelley. T. t.. 128. 150, 541 
Kirlc, B. A.. 47. 268, 343 
Koezevicb, S., 150 
Kornhauser, 5. 1.. 293 
Kttdet, G. F., 319. 351 

Liitin.Y.J.47 
Learned W. S.. 237 
Lecky, P., 227 

Unnon, R, T., 114, 126, 134, 150, 346 
Leotiari, N., 244, 280 
lincolo, B. A., 290 

Liadqwit E. F., 42, 71, 79. 106, 113, 
114, 115, 150 
liodaey, G., 319 
Uds, M. B , 247 
Longstaff, H. P, 293 
Lorge, I., 114 
Lyton, W. L, 309 

McCabe. G. E, 252, 280 
McCall, W. A., 224 
MacLean, M. 286. 287 
Md4emir, Q., 245 
KlacQuactie, T. W, 83, 98 
Manuel. H. T.. 114 
Mathewson, R, H, 26 
Matteson, R. W, 269 
Meehl, P. E, 113. 267, 280 
Mtciill, M. A. 123 
Maier, L M , 547 
Morse, W. C, 288 
Murphy, G„ 227 
Murtay, H. A-, 321 


RapapOrt, G, M., 106 
Raskin, 34t 
Raths, L. 3l4 
Retnmers, H. H., 114 
Ricks. J. H., Jr., 313 
Rnebef. E. C.. 297 

Roens. B. A,, 4, 5. 9, 15. 26, 114. 238, 
252, 280. 507, 315, 334 
Rogers. C R., 267, 282 
Ross. C C, 36. 343. 351 
Ross, B. E., 119 

Rothoer, J- M., 4. 5. 6, 9, 15, 26, 
99. 114. 150, 208, 224, 238, 248, 2S2, 
254. 268. 269, 280. 289, 304, 307, 
>08, 315, 319, 334, 343, 343, 353 
Ruch, F. Ly 25 

RuJoa, P. J., 150, 224, 332, 331 


Sanderson. H. 2., 250 

Schenk. Q. L., 245, 280 

Schmidt, L. A., 130, 319, 345 

Seashore, H. G., lOO, 313, 321, 329. 340 

Segal, D., 341 

Selby, ?. O., U4 

Shaffer, UT, 314 

Smith, E. R^ 205, 224. 237, 280 

Sorokin, P. A., 331, 343, 351 

Spencer, L. M., 11 

Stol^ U, 246 

Stanley, J. C, 36, 345^ 351 

Steirard, V., 293 

Stewart, N., 150 

Stordahl. K. E., 519 

Stotightoa, R. 348 

Straflg, R., 103 


Strong, E. K., 291, 319 
Stull. D. B., 114 
Stuokel, E. R., 113 
Sullivan, E. T.. 128 

S«P-r, D. E, 17. 2S, 47, 103, n. i;,, 
268, 290, 342, 343 



373 


Index of 

Terman, L. M., 123 

Thorndike, R. L., 26, 67, 70, 75, 79, 96, 
114, 150, 226, 280, 324, 343, 352 
Thorpe, L. P., 63, 287 
Thurstone, L. L., 128, 225, 228, 319 
Thurstone, T. G., 150, 228 
Tiedeman, D. V., 215 
Tiegs, E. W., 127, 128, 129 
Toops, H. A., 228 
Torgerson T. L, 62 
Traeger, C, 41 

Travers R. M. W., 72, 115, 134, 150, 
343, 352 

Traxler, A. E., 47, 62, 78. 93. 96, 104, 
105, 106, 115, 141, 150, 225. 288, 
321 


Names 

Trinkans, W. K., 309 
Tyler, L E., 268, 269, 286. 312 
Tyler, R. W., 205, 224. 290 

Warner, W. L., 246, 247 
Warters, J.. 26 

Wcsman. A. G.. 100, 230, 234, 280, 
324, 340 

Whisler, L. D., 114 
Whyte, W. H., Jr., 12, 26 
Wilder, C. E.. 113 
WiUey, J. M., 62 
Windle, C. D., 319 
Wiogo, J. M., 288 
Woellner, R. C, 268 
Wolfle. D. L.. 26, 246. 280 


INDEX OF SUBJECTS 


Achievement, academic, 229, 234, 237 
239, 244 

Activities reports, 353-370 
Alienation, coefficient of, 229 
American Council on Education PsTcho- 
logical Examination, l6l, 228, 242, 
263 

American Educational Research Associa- 
tion, 153, 342, 344, 345 
American Personnel and Guidance Asso- 
ciation, 347, 348 
Ethical Practices Report, 347 
American Psychological Association, 40, 
48, 342, 348 

Ethical Practices Report, 347 
American Textbook Publishers' InsUtute, 

Aiisw« sheets. 105-107. 108. 109, 145- 

. .e. 

Aptitude Tests for OccupaUons, 61 

Behavior description, 252-255 
Bennett Stenographic Aptitude Test, , 

Bennett Test of Mechanical j^mpre- 
hension, 58. 84. 89, U6. 265 
Bemreuter Personality Invemory. 30Z 
Blylh Second-Year Algebra Test. 60 
Bureau of Publications, 39 

California Achievement Test, 242 
California Test Bureau. 38. 335 
California Test of PersonJ*V. 28^0^ 
California Tests of Mental Wat^iV, >7. 

107, 108, 127-130. 210, 261 
Cases. Art, 263-264 
Barbara, 265-266 
Bert, 258-260 
Bill, 264 
Dob. 260-262 
Dave. 262-263 
Ed, 255-258 
George, 271-273 
Jeff, 241-244 


Jerry, 236 
Mali:, 263 
Mike, 273-279 
Raoul, 263 

Shelia, 239-241 . „ j ,t 

College Entrance Erammation Boa^ 37 
Committee on Diagnostic Reading Tests, 
36 

Computers, electronic, 328-329 
Confiikoce levels, 188 
Consistency, internal, 192-194 
of perforroance, 65 
trait, 227 

Codperative Test Division, 38. 16l» 

343 

Counseling, objecuves of, 4 

Counselors, certificauoo of, 347-549 
4]ues(ions asked by, 8 
role of, 5-6 

Cumulative records, 204 

Davis-Eells Games, 542 
Differential ApUtude Tests, 46, 

84 86. 87, 99, 100, 108, 112, 210. 
230, 254. 242. 256, 259. 261. 265. 
264’ 277, 340, 341 

Educational Records Bureau, 37-38. 91. 
338 

Educational Test Bureau, 38 
Educational Policies Co'^mission, 324 
Educational Testing Service. 36-37, 10^ 
195 

Eight-Year Study, 252 

57. 327-329. 341 
Followup studies. 214 
Fotccd-cboice techniques, 28«, 


Index of 

Geoeral AptituaeS Test Bitteiy. 4lM2, 

>27. 3^2 

G»d«. 2>7-2>9, 244 

Guidince workers, sua<iir<ls of, 347^350 

Hmud UniveKitT P^o*. 3* 

Heajwa-Nelsoa Test of Meotal AMsty, 

46, 107, no. 245, 236, 259, 262, 

277, 279, 340 

Hestoa Personil Aijiatiuenl Inventory, 

302. 303 

HoltiBg«-Ctaw4et Uai-Fsctot Tests, 57 
Hougbtoa Mifflin Gjmpiflf, 39 

InfoiiMlioa, serojto of in counseling, 
9-10 

Institutions for aavsneea traloing; ques* 
tioos asked by. 8 

Inteiest inveatories, aitiul evaluatioo of, 
282-301 

nonos of, 310-313 
reliability of. 308-310 
validity of, 290-301 
Interviews, 10, 282 

lotrapersoaal facton ana test scores, 
248-252 

Iowa Algebra Aptitude Test, 58 
IQ, concepts of, 125-128 

K'Seores, 213 

Kudec Fie/cTEOce Hecord, 56, 210, 282, 
288. 293, 316, >18 
Kuhlman-Andeisoa Test, 239, 261 

Lorge-Tbondike IntelUgence Tests, 57 

31acQuariie Test of Mechanical Abili- 
ties. 46. 84, 90, 98 

blanpower uiUizatioo, testing iil 16-20 
314-325 

vs. hunun development concepts, lo- 
ll 

Measuretnent, actuarial concept of, 15 
Meier Ait Test, 59 

Mental Measurement Yearbooks 117 
148,282,314. 342,344 
bfinoesota Qcttcal Tests, 107 
bfinnesota Muitiphasic Fetsoaalisy In- 

vcntoiy, 210, 282. 30I. >12 
MinnesoU Paper Form Board, 84, 87 

bfouvuion, and social class, 247-248 


Subfeefs 

effecU of, oa test performaace, 248- 
252 

of piqiils for testing, 99 
Multiple Aptitude Tests, 341 
MyervRuch High School Progress Test, 
lOS 

Matiooal Council on MeasureiDenU Used 
io Education, 93. 153, 342 
Kauonal Intelligence Tests, 107 
Noms, cbasactcristia of, 83-85, 196- 
202 

in test interpretation, 81-82, 85-91, 
135-158, 3I2-3I3, 338 
local, 91-92 

personality and interest inventory, 
310-513 

Objectivity, 334-335 
Ocropatioi^ Guide Series, 45 
Ohio State Psycbolopcal Test, 228, 239 
Otis Qoick-Swtuvg Test of Mental Ab>U 
tty, 56. 98, 107, 126, 146, 228, 
239. 2« 

Parents, questions asked by, 7 
Percentile, in reporting to parents and 
teachers, 81-83. 215 
in test score Interpretation, 208-209, 
215-216 

Fersoaaliry appraisal techniques, norms 
0$. 310-313 
rdi^ility of, 308-^10 
validity of, 301-303 
Fialner-Cunaingham Test, 26l 
Predicuoa, 50-52, 216-219, 228-236 
priouuy Mental Abilities Tests. 46 
Pndiabllity, 232 

profiles, as method of recording test re- 
sults, 210-211 

test interpreuiion from, 2ll, 3li 
Progressive Achievement Tests, 46, 277 
Progressive Education Association. 252 
Proiective methods, critical evaluation of, 
303-303 

reliability of, 308-310 
validity of, 304-308 

Psychological Corporation, 39-40 207. 
344 

Test Service Bulletins of, 64, 203, 
230. 233. 234, 346 

Public School Publishing Company, 38 



Index of Subjects 


377 


Reliability, 65-81, 187-192 
and validity, 80-81 

coefficients of, 67, 68, 72, 73, HO, 
188, 308, 310 
equivalent forms, 70-71 
factors influencing, 71-75, 138-139 
Kuder-Richardson reliability method, 
70-71, 189, 190, 194 
long-term, 81, 138, 141, 195 
methods of determining, 6^70 
of individual scores, 75-78 
of speed tests, 67-69 
split-half, 66-69, 73. 138, 308-309 
test-retest, 69-70, 309-310 
Rogers Test of Personality Adjustment, 
282 

Rorschach Inkblot Test, 282, 301 

School and College Ability Tests, 61, 74, 
95, 107, 160-203, 341, 343 
School marks, 229, 234, 237-239, 

244 . . u. 

School personnel, questions asked by, 
7-8 

Science Research Associates, 38 
Scores, scaled, 214 
stability of, 78-80 
true, 74-75 

Scoring machine vs. hand, 105-106 
Selection, and counseling, 235-237 
testing for, 18-21 

Selective Service College QualificaUon 
Tests, 40 

Self-concept, 250-251, 289 
Sequential Tests of Educational Progress, 
341, 343 . 

Short-cut methods, attempts to justily. 
285-283 

limitations of, 284-285 
objections to, 288-290 
Social class status, 244-247 
SRA Clerical Test. 60 
SRA Mechanical ApUtudes Test^S-* 

SRA Primary Mental Abilities Tests. 46. 

56. 88. 108, 256. 259, 277 
Stability, coefficient of, 78-80. 309 
Standard error of esumate. 2)) 

Standard error of measurement. 76-78. 

140. 232 ^ 

Stanford Achiesement TesU. 210 
Slanford-Binet, 46, 123. 126, 

Stanford University " 

Statistical methods, 336-3J7 


Steoo^aphic Aptitude Tests, 68 
Strong Vocational Interest Blank, 282, 
286, 288, 291-292, 293, 311, 318 
StndfoK. questions asked by, 7 
Studies, longitudinal, 227, 238, 336-337 
Subjectivity, 267, 334-335 
Survey of Mechanical Insight, 59 

T-scores, 214 

Teacher evaluations, 237-239, 240 
Technical Recommendations for Achieve- 
ment Tests, 94, 342 
Technical Recommendations for 

logical Tests and DiagoosUc Tech- 
niques, 48, 71, 93, 151-203 
Terman Group Test of Menul Ability, 

Ternun-McNemar Test of Mental Abil- 
ity. 123. 125 

Test administration, 92-104, 19^196 
directions for, 97-98, 142-144 
importance of. 92-95 
motivation of students through, 97- 
100, 268 

observations during, 105-104 
physical conditions of, 96*97 
Test authors, 119-120 

Test coverage, 33, 151“H2 

Test development, assumpuons involved 
in. 123-131 

basic problems in, J3^”4 
Test evaluauon. criteria for, 15l-‘20} 
Test improvement and consumer demand. 
529-330, 343-347 

Test inierpreuiion, implied validity m. 

>5^1. 131-135. 268-271 
Test materials, sources of. 36-12 
Test of Ssiic Sblls m Arithmeuc. 58 
Test publishen. 36-40, 120-122 
commercial, 58-39 
entetia for checking. 42-45 
Tcl .oulU .nJ 

written rcpoiU of. 255-266 
T«lKO>n. 104-107. 144-147, 19V190 
oihcr d4U, 765-771 
cumuIaUse recording of. *04-209. 
213-216 

etiologyof. 116*147 

facton influencing. 119. 226-228. 539 
in counseling. 130, 133-135 
prediction from. 216-219 
pmflies of, 210-211 
rtcofdmg and tcportmgof. *04-221 



378 


Index of Subieeh 


Test selectioo, 27-36, 151-203 
avttTU of, 30-51 

Test titles «. content, 33-36, 122-131, 
295-296 

Testing, and free schools, 337-339. 
differential, 340-342 
future of, 339-350 
ia Europe, 18 
ia selection, 235-237 
industrial, 18-19 
scope of, 10-14 

Testing prognms, rwewtal abilit7 tests 
■used in, 130-131 
purposes of, 131-132 
Tests, apparatus, 28, 29 

consumer interest in, 329^30 

cooperative, 239, 241, 259, 2d3, 318 

development of, 322-339 

/otxoat of, 28-29 

group, 28, 29 

group Ts. individual, 31 

iuU^ldvuI, 28 

infonnation provided bf, 229 

joint \u« of, 319-221 

locallf cDOstrocted, 42 

neehanical considerations of, 107'-109 

miniature vs. trait, 23, 32 

noaverbal, 28-29 

power, 28-29 

professional acceptance of, 62-8> 
recognition vs. demoastratioa, 33-34 
reviews of, 344-345 
single score rs. biKcry, 23, 31-32 
speed, 28, 249, 335 
standards for, 330-331 
time-limit, 28, 29 


tra^ 28, 227 
types of, 27-30 
eeihal, 28, 29 
work'iimit, 23, 29 

Tliei&atic Apperception Tes^ 282, 288 
Tbnistone TeiEpetaiDcat Schedule, 302, 
303 

Tufse Oerical Aptitudes Tes^ 72 

United States Employment Serricc, 40 , 
327, 341 

testing program, 40— f2 

Validitf, 43-44. I3I-135, 174-187 
coeScienU of, 49-65 
concuneot, 52-53, 180-186 
construct, 55-54, 186-187 
conienl, 50, 174-179 
group vs. individual, 55 
of penonality and interest iaveatotie$> 
301-303 

30-51, 37, 63, 379-580, 

503, 331 

Van Vagenea Iteadlsg Keadiness Tesi; 
108 

OTeduIer Adult InCelligeDce Scale, 242 
Wechsler Intelligence Scale for Qalldrea^ 
207, 208, 261, 263 
Wisconsin Counseling Study, 100 
World Boot Company, 39, 346 
World War U, testing program in, 18 

Yorul^ problems of, 2-4 

Ti-scores, 207 



